Code Review Is Broken, and Diff Tools Are Part of the Problem

The line-by-line view creates a perverse incentive structure

Here is how most code review actually works: a developer opens a PR with 800 lines changed across 23 files. Their reviewer opens the GitHub diff view. They see a waterfall of red and green lines. They start at the top and scroll down.

What gets caught in this review? Typos. Inconsistent variable naming. A missing null check on line 47. A comment that doesn't match the code. These are real issues — but they are the least important issues in that PR.

What doesn't get caught? The fact that this PR introduces a third authentication path that now makes the auth system effectively unauditable. The fact that the abstraction chosen here will calcify in the codebase for years. The fact that there's already a utility in lib/helpers.tsthat does exactly what this new function does. The line-by-line view doesn't show you any of that. It shows you lines.

Large PRs make line-by-line review nearly useless

Research on code review effectiveness is sparse and methodologically messy, but the data that exists consistently shows diminishing returns past 200-400 lines of change. After about 400 lines, defect detection rates drop off sharply. Reviewers get fatigued. Attention narrows to whatever is currently on screen.

Yet the median PR in most organizations is nowhere near that limit — in one direction or the other. Either developers split their work into tiny, context-free micro-PRs (which creates a different problem: the reviewer can't see the forest for the trees), or they batch work into massive PRs that no one can meaningfully review.

The 800-line PR described above isn't a hypothetical. It's Tuesday. And the review process for that PR is theater: both parties know the reviewer isn't catching structural issues, but the box gets checked anyway. Code review becomes a ritual that provides false confidence rather than real quality assurance.

The split-diff vs unified diff debate is the wrong argument

A significant amount of developer energy goes into arguing about whether split (side-by-side) or unified (single-column) diff is better for review. This is a real difference with real tradeoffs — split view makes it easier to compare old and new versions spatially; unified view is more compact and terminal-friendly — but it's a second-order concern.

Whether you see the before and after side by side or stacked, you're still looking at a line-by-line transformation. You're still not seeing the architecture. You're still not seeing the change in context of the rest of the codebase. The diff format is the binding constraint, and arguing about the layout of that diff is rearranging deck chairs.

That said: split view is better for most structural changes, and unified is better for small line edits. The best reviewers switch modes based on what they're looking at. The worst reviewers pick one and stick with it forever.

The case for semantic diff

The thing we actually want from a diff is a description of semantic change — what does this code now do differently? Text diff gives us syntactic change — what characters are different? These are related but not identical, and the gap matters enormously.

Semantic diff tools exist and are underused. For languages with good AST tooling, you can diff at the abstract syntax tree level: detect that a function was renamed (not deleted and re-added), that a parameter was added to a method signature, that a conditional was inverted. Tools like GumTree for Java and Python, or difftastic for a wide range of languages, produce diffs that align with the structure of the code rather than its characters.

# difftastic: syntax-aware diff that understands code structure
# Install: cargo install difftastic
# Use with git:
GIT_EXTERNAL_DIFF=difft git diff HEAD~1

# Or set as default:
git config --global diff.external difft

Difftastic understands 30+ languages and produces diffs that align with syntactic structure. A function rename shows as a rename, not a delete-and-add. A reformatted block shows no diff if the AST is identical. This is what code review tooling should look like by default.

The adversarial dynamic nobody talks about

There is a social problem embedded in the standard code review format. The diff presents a reviewer with a fixed set of changes and asks them to find fault. This is structurally adversarial: one party has done work, another party is hunting for problems with it. Even with the best intentions, this creates a dynamic where critical feedback feels like an attack.

The nitpicking that line-by-line review encourages makes this worse. If the reviewer can't see architectural issues (because the format doesn't show them), they leave comments on the things they can see: style, naming, minor logic issues. The author receives ten comments about variable names and zero substantive architectural feedback. They fix the variable names, the review is approved, and the structural issue ships.

Teams that do code review well have usually figured out implicitly that the review conversation should happen before the PR is submitted — in a design document, a brief architecture discussion, a quick Slack exchange about approach. By the time the diff appears, the big decisions are already made and agreed on. The diff review then focuses on execution quality, where line-by-line is actually appropriate.

The counterargument: line diffs ARE useful for small, precise changes

To be fair to the format: for a 20-line bug fix, line-by-line diff is exactly the right tool. You want to see exactly what changed, character by character. A semantic diff for a typo fix in a comment would be absurd overkill. The diff format is not wrong — it is wrong as the universal default for all review scenarios.

Security fixes, hot patches, dependency updates — these benefit enormously from precise line-level review. The problem is that teams apply the same review process to a one-line bug fix and a multi-week feature. The format should adapt to the change size and nature. It almost never does.

What good review actually looks like

The developers who do code review well are opinionated about their tools. They know when to use split view vs unified. They know when to check out the branch locally and run it rather than reading a diff. They ask for architectural context before reviewing a large PR. They separate "design review" (pre-implementation) from "implementation review" (the diff).

The organizational intervention that actually works is PR size limits. Not hard limits — culture doesn't respond well to hard rules on creative work — but norms. If the default expectation is that a PR is reviewable in under 30 minutes, authors decompose their work accordingly. This forces design decisions to happen in conversations before code is written, which is exactly where they should happen.

Better tooling matters too. Use difftastic for semantic diff. Use git diff --color-moved to detect moved code. Use histogram diff instead of Myers. And when reviewing a structural change, check out the code and read the files — not just the diff. The diff tool is a starting point, not the complete picture.

Try it yourself

Diff Checker — compare two texts online →