What Improving DORA Metrics Actually Requires Beyond Faster Deployments

by Sophie Lane
Posted: Jun 07, 2026

When engineering teams start working on their DORA metrics, the first instinct is usually to focus on deployment frequency. Push more often. Automate more of the pipeline. Remove manual approval gates. The number goes up and the team feels like progress is being made.

Deployment frequency is the easiest DORA metric to move. It is also the one that creates the most problems when teams treat it as the primary goal rather than as one signal among four.

The four DORA metrics - deployment frequency, lead time for changes, change failure rate, and mean time to recovery - were designed to be read together. Improving one while neglecting the others does not produce better software delivery. It produces a more complicated version of the same problems, arriving faster.

Why Deployment Frequency Alone Is Not Enough

Teams that focus exclusively on increasing deployment frequency without addressing the underlying delivery system tend to follow a predictable path.

Deployments increase. Pipeline automation improves. Lead time comes down. The team celebrates the numbers. Then change failure rate starts creeping upward. More deployments means more opportunities for regressions to reach production. Mean time to recovery starts climbing because engineering time is being consumed by incident response instead of feature delivery.

The net result is a team that ships more often and spends more time fixing what they shipped. The DORA metrics that are easy to measure improve. The DORA metrics that reflect actual delivery quality get worse.

This pattern is not a failure of the DORA framework. It is a failure of treating deployment velocity as the goal rather than as one component of a healthy software delivery system.

What Lead Time for Changes Actually Measures

Lead time for changes measures the time between a code commit and that code running in production. Most teams understand this. Fewer teams understand where lead time actually comes from and which parts of it are worth optimizing.

Lead time has three components. The time to write the code. The time to get the code reviewed and merged. The time to get the merged code through the pipeline and into production.

AI coding assistants and productivity tooling have reduced the first component significantly for many teams. Pipeline automation has reduced the third component. The middle component - review and merge time - is where lead time most commonly stalls and where the improvement opportunities are least obvious.

Code review bottlenecks are almost always downstream consequences of other problems. Large pull requests that are hard to review quickly. Test failures that require investigation before a reviewer will approve. Unclear change scope that creates uncertainty about whether approval is safe. Addressing lead time effectively means addressing these upstream causes rather than pressuring reviewers to approve faster.

Change Failure Rate Is the Metric Testing Quality Determines

Of the four core DORA metrics, change failure rate is the one most directly connected to software testing practices. It measures the percentage of deployments that cause a production failure requiring remediation.

Teams with strong regression testing coverage, accurate dependency validation, and integration tests that reflect current service behavior tend to have low change failure rates almost regardless of how frequently they deploy. Teams with weak testing infrastructure tend to have high change failure rates that get worse as deployment frequency increases.

The connection is structural. Regression testing is the mechanism that catches behavioral changes before they reach production. When regression coverage is comprehensive and accurate, most potential failures get caught in the pipeline. When regression coverage has gaps or is running against outdated dependency representations, failures that could have been caught in CI reach production instead.

Improving change failure rate sustainably requires improving the testing infrastructure that sits underneath it. Speed improvements to the pipeline without corresponding improvements to testing quality produce faster delivery of failures rather than faster delivery of reliable software.

The fifth DORA metric - reliability, added in the most recent version of the framework - captures this even more directly. Reliability measures whether services meet their availability and performance targets in production. It is the cumulative signal of how well the delivery system is maintaining system stability under the pace of change.

Mean Time to Recovery Reflects Observability Investment

Mean time to recovery measures how long it takes to restore service after a production failure. It is the DORA metric most directly influenced by observability infrastructure rather than by testing or deployment practices.

Teams that invest in distributed tracing, structured logging, and meaningful alerting can identify and isolate production failures in minutes. Teams that rely on basic monitoring and manual log review spend hours diagnosing the same failures.

The observability infrastructure that determines mean time to recovery needs to be in place before deployment frequency increases rather than after. Teams that accelerate deployments without corresponding observability investment find that each production incident takes longer to resolve as the system becomes more complex and failures become harder to trace.

Mean time to recovery also reflects incident response process maturity. Clear runbooks, practiced incident response procedures, and defined escalation paths reduce recovery time independently of technical tooling. The teams with the lowest mean time to recovery tend to have both good observability tooling and well-practiced incident response processes. Neither alone produces the same result as both together.

The Reliability Dimension Most Teams Underinvest In

Reliability as a DORA metric captures something the other four metrics do not - the cumulative effect of delivery system health on production stability over time.

A team can have healthy deployment frequency, reasonable lead time, acceptable change failure rate, and fast mean time to recovery while still experiencing gradual reliability degradation. This happens when the pace of change consistently exceeds what the testing and validation infrastructure can absorb safely.

Individual deployments pass their gates. No single change produces a dramatic failure. The aggregate effect of many changes shipping faster than the system can maintain stable behavior produces the reliability degradation that this metric captures.

Teams that monitor reliability alongside the other four DORA metrics catch this pattern earlier than teams that treat reliability as an outcome of the other metrics rather than as an independent signal. A reliability trend that is moving in the wrong direction while the other metrics look healthy is almost always a signal that the testing and validation infrastructure is not keeping pace with delivery velocity.

What Actually Moves All Five Metrics in the Right Direction

The teams that improve all five DORA metrics simultaneously - not just the easy ones - tend to share a common approach. They treat testing infrastructure, observability investment, and deployment automation as interdependent components of the delivery system rather than as separate concerns.

Deployment frequency improves sustainably when the testing infrastructure catches regressions reliably enough that more frequent deployments do not produce proportionally more failures.

Lead time for changes improves when code review bottlenecks are addressed upstream rather than downstream, and when pipeline execution is fast enough to provide feedback before reviewers move on to other work.

Change failure rate improves when regression coverage is comprehensive and current - covering not just the happy path but the integration scenarios, boundary conditions, and dependency interactions that are most likely to regress under production conditions.

Mean time to recovery improves when observability infrastructure is in place before it is needed and when incident response processes are practiced rather than improvised.

Reliability improves when all of the above are functioning well enough that the pace of delivery does not consistently outstrip the system's capacity to maintain stable behavior.

DORA metrics are not a checklist. They are a system. Improving them requires treating them as one.

About the Author

I’m Sophie Lane, passionate about simplifying Api testing, test automation, and enhancing the overall developer experience.

Rate this Article

Sophie Lane

Member since: Sep 15, 2025
Published articles: 24

What Improving DORA Metrics Actually Requires Beyond Faster Deployments

About the Author

Rate this Article

Leave a Comment

Sophie Lane

Related Articles