All articles
SaaS & Growth 10 min readJanuary 29, 2024

The Engineering Metrics That Actually Predict Outcomes (And the Ones That Lie)

Most engineering dashboards measure activity rather than impact. Here's the difference — and the metrics that actually predict whether your engineering organization is healthy.

Engineering metrics are political as much as they are technical. The metrics you track determine what gets optimized, what gets reported to leadership, and what behaviors get rewarded implicitly. Tracking the wrong metrics doesn't just produce misleading dashboards — it produces misaligned incentives that shape engineering culture in ways that take years to undo.

The history of software engineering is littered with metrics that were optimized into irrelevance: lines of code (which incentivizes verbose code), bug count (which incentivizes not filing bugs), velocity in story points (which incentivizes inflating estimates), and test coverage percentage (which incentivizes writing tests that execute code without asserting anything meaningful). Each of these metrics looks reasonable in isolation and produces perverse outcomes when optimized.

The DORA Metrics: The Best Available Framework

The DevOps Research and Assessment metrics — deployment frequency, lead time for changes, change failure rate, and time to restore service — represent the most robustly validated framework for measuring software delivery performance. They're based on years of empirical research across thousands of engineering organizations and are predictive of both business outcomes and organizational health in ways that most engineering metrics aren't.

Deployment frequency measures how often code reaches production. High-performing teams deploy multiple times per day. Low-performing teams deploy monthly or quarterly. The correlation between deployment frequency and code quality is counterintuitive but consistent: teams that deploy more frequently have lower change failure rates, because small frequent changes are easier to review, test, and roll back than large infrequent ones.

Lead time for changes measures the time from commit to production. This is the aggregate of every quality gate, review cycle, and approval process in your pipeline. Long lead times indicate either slow processes or insufficient automation. The best teams achieve lead times of hours rather than days or weeks.

Change failure rate measures what fraction of deployments require a hotfix, rollback, or emergency patch. This is the most direct measure of code quality in production. Teams with strong review practices and automated quality gates consistently show change failure rates below 15%. Teams without these practices see rates above 30-45%.

Time to restore service measures how long it takes to recover from a production incident. This reflects the quality of observability, incident response processes, and architectural decisions about rollback and failover. The difference between elite and low-performing teams on this metric is often an order of magnitude.

Code Quality Metrics Worth Tracking

Beyond DORA, a handful of code-level metrics provide useful signal. Escaped defect rate — the fraction of bugs found in production versus in review or testing — directly measures the effectiveness of your pre-production quality process. Review cycle time — how long between PR submission and first substantive feedback — predicts both developer experience and velocity. Review iteration count — how many rounds of feedback a PR requires before merge — indicates the quality of PR descriptions and alignment between author and reviewer expectations.

What Not to Track

Never optimize for individual developer productivity metrics — lines written, tickets closed, commits per day. These metrics measure activity and incentivize the behaviors that look productive on a dashboard rather than the behaviors that produce good software. The engineer who spends a week redesigning a brittle subsystem that eliminates a class of production incidents has created more value than an engineer who closes twenty tickets of surface-level feature work — but the first engineer looks unproductive by most individual productivity metrics.

Engineering is a team sport. Measure the team's outputs — deployment frequency, quality, velocity — not the individual's activity. The team metrics optimize for collaboration and shared ownership. The individual metrics optimize for personal visibility at the expense of collective quality.

Try CodeMouse on your next PR

Free AI code review on every pull request. Bring your own API key — no subscription needed.

Install on GitHub — Free