Engineering 8 min readSeptember 24, 2024

The Relationship Between Tests and Code Review That Most Teams Get Backwards

Tests and code review are both quality mechanisms — but they catch different things, at different times, with different costs. Most teams use them as substitutes. The best teams use them as complements.

"The tests pass" has become the implicit standard for "this code is ready to merge" in a large fraction of engineering organizations. This is a reasonable heuristic until you understand what tests actually guarantee — and what they fundamentally cannot.

Tests verify that the code behaves as the test author expected. They don't verify that the test author's expectations were correct, that the behavior is architecturally coherent, that the code doesn't have security implications that nobody thought to test, or that the approach is the right one for the problem. These gaps are exactly what code review exists to fill.

What Tests Are Good At

Tests are exceptional at verifying specific, predetermined behaviors consistently and cheaply over time. A test suite that covers 85% of code paths will catch regressions in those paths on every subsequent commit, for the lifetime of the codebase, at essentially zero marginal cost per run. This is an extraordinary property that human review cannot replicate — no engineer can re-verify all previously-working behavior on every PR.

Tests also excel at encoding the specific edge cases and failure modes that engineers discover during development and want to ensure are never reintroduced. The act of writing a test for a discovered edge case is a way of institutionalizing the learning permanently.

What Tests Are Poor At

Tests are poor at catching the things that human reviewers are good at: architectural misalignment, incorrect problem framing, security implications outside the test's scope, performance characteristics at scale, and the category of bugs that arise from incorrect assumptions that are consistently encoded in both the code and its tests.

The last category is the most dangerous. If an engineer has a fundamental misunderstanding of how a system works and writes both code and tests based on that misunderstanding, the tests will pass and the code will be wrong. The only thing that can catch this class of error is a reviewer who understands the system correctly and notices the discrepancy.

The Complementary Model

The teams with the best combination of velocity and reliability use tests and review as deliberate complements. Tests handle regression coverage and behavior verification automatically. Review handles architectural coherence, correctness of intent, and the classes of issues that tests by definition cannot encode.

Concretely, this means treating "tests pass" as a necessary condition for merge but not a sufficient one — while also being honest about what the review is actually adding beyond CI. Reviews that degenerate into "style nit" exchanges when CI is green are reviews that have lost their function. The reviewer's job when tests pass is to ask the questions the tests didn't: Is this the right approach? Are there security implications outside the test scope? Are the assumptions being made here correct?

AI Review as the Third Complement

Automated code review occupies a specific and valuable niche between tests and human review. It applies consistent pattern-checking across the entire diff — catching the security issues, the null dereferences, the performance anti-patterns — that neither tests (which verify intended behavior) nor human review (which focuses on architectural intent) systematically covers. The three mechanisms together create overlapping coverage that significantly reduces the probability of any class of issue surviving to production.

The teams that get this architecture right don't use AI review as a replacement for either tests or human review — they use it to fill the gap between them, where the pattern-level issues that neither tests nor intent-focused human review reliably catch tend to accumulate.

Try CodeMouse on your next PR

Free AI code review on every pull request. Bring your own API key — no subscription needed.

Install on GitHub — Free

The Relationship Between Tests and Code Review That Most Teams Get Backwards

What Tests Are Good At

What Tests Are Poor At

The Complementary Model

AI Review as the Third Complement

Try CodeMouse on your next PR

More from the blog

We Automated 10 Million Code Reviews. Here's What We Learned.

The PR Size Problem: Why Your Biggest Reviews Are Your Riskiest Deployments

The 7 Security Vulnerabilities Most Likely to Survive Your Code Review