Myers, Patience, Histogram: The Diff Algorithms Behind Git Explained

The diff problem is harder than it looks

Given two sequences of lines, find the shortest edit script that transforms one into the other. That's the core problem every diff algorithm solves. The catch: "shortest" doesn't mean "most readable." There are often many edit scripts of equal minimum length — and the algorithm's heuristics determine which one you see.

This isn't an academic concern. When you're reviewing a PR and the diff shows a function being deleted and re-added 40 lines later rather than showing it was moved, that's the algorithm making a bad choice. Understanding the trade-offs helps you know when to trust the diff and when to look again.

Myers: the algorithm Git uses by default

Eugene Myers published his diff algorithm in 1986, and it remains Git's default to this day. It finds the longest common subsequence (LCS) between two files, then inverts that to get the shortest edit script. Myers is O(ND) where N is the sum of lines in both files and D is the number of differences — it's fast, memory-efficient, and correct.

But correctness here means "produces a valid minimal diff," not "produces the diff a human would write." Consider refactoring a Python class that has a lot of boilerplate. If you move a method and rename a variable, Myers might decide the best edit script is to delete the entire old method and re-add a superficially similar one — even if 90% of the lines are identical. It optimizes for edit count, not human comprehension.

# Myers diff of the same logical change can look like this:
-def process_items(items):
-    result = []
-    for item in items:
-        result.append(item.transform())
-    return result
+def process_items(items):
+    return [item.transform() for item in items]

# Four deletions, two additions. Correct. But hard to review.

Myers is optimized for machine performance. It's great when diffs are small and localized. It starts to fail you when code is refactored, moved, or restructured — exactly the cases where code review matters most.

Patience: diffing like a human would

Patience diff, developed by Bram Cohen (creator of BitTorrent), takes a fundamentally different approach. Instead of finding the global LCS, it first anchors the diff on unique lines — lines that appear exactly once in both versions. It then recursively diffs the segments between those anchors.

The insight is that unique lines are almost certainly matched correctly. A function signature that appears once in each file almost certainly corresponds to the same function — so anchor there, then figure out the interior. This produces diffs that align with the structure of the code rather than just its character content.

# Enable patience diff for a single command:
git diff --diff-algorithm=patience HEAD~1

# Set it as the default in your git config:
git config --global diff.algorithm patience

Patience diff tends to produce much cleaner results when functions are added or removed from a file that has many similar-looking closing braces or boilerplate lines. The classic example is a C file where adding a new function causes Myers to incorrectly align } characters between functions. Patience keeps the structure intact.

Histogram: patience, but faster

Histogram diff is a refinement of Patience developed by JGit (the Java Git implementation) and now available in Git via --diff-algorithm=histogram. It handles the case where Patience struggles: low-occurrence lines (not necessarily unique).

Rather than requiring a line to appear exactly once, Histogram uses a frequency histogram of all lines and prefers to anchor on the least-common matching lines. This makes it more robust when code has repeated patterns — like a file full of similar getter methods — where Patience's "unique lines only" rule means it finds few anchors and falls back to something Myers-like.

# Use histogram for a single diff:
git diff --diff-algorithm=histogram HEAD~1

# Or set globally (recommended for most codebases):
git config --global diff.algorithm histogram

# Combine with word diff for extra clarity:
git diff --diff-algorithm=histogram --word-diff HEAD~1

In practice, Histogram is the algorithm most likely to produce diffs that match your mental model of the change. If you're going to change one setting in your Git config today, this is it.

A concrete before/after: seeing the difference

Here's a scenario that demonstrates the difference clearly. You have a file with two similar functions. You add a new function between them. All three algorithms produce a valid diff, but the readability varies dramatically.

# Original file:
function validateUser(user) {
  return user.email && user.name;
}

function validateProduct(product) {
  return product.sku && product.name;
}

# New file (added validateOrder in between):
function validateUser(user) {
  return user.email && user.name;
}

function validateOrder(order) {
  return order.id && order.items.length > 0;
}

function validateProduct(product) {
  return product.sku && product.name;
}

# Myers diff (can misalign on the closing braces):
+function validateOrder(order) {
+  return order.id && order.items.length > 0;
+}
+
 function validateProduct(product) {
   return product.sku && product.name;
 }

# Histogram diff (correctly identifies the insertion point):
 function validateUser(user) {
   return user.email && user.name;
 }
+
+function validateOrder(order) {
+  return order.id && order.items.length > 0;
+}
+
 function validateProduct(product) {

The Histogram version makes it immediately obvious that a new function was inserted. The Myers version requires you to mentally reconstruct the context. When reviewing code, that cognitive overhead compounds across every hunk in every file.

Why this makes you a better code reviewer

Understanding what the diff algorithm optimizes for changes how you interpret diffs. When you see a massive diff that deletes and re-adds what looks like the same function, your first question should be: is this a genuine rewrite, or is the algorithm failing to show a move? You can verify by checking if the deleted and added versions are semantically identical.

It also gives you options. If a diff looks confusing, try git diff --diff-algorithm=histogram on the fly. If you're reviewing a PR where functions were moved, git log --follow -p can show you the file history respecting renames. git diff --color-moved explicitly detects and highlights moved blocks.

The diff algorithm is not a neutral reporter of truth. It's a heuristic that makes choices. Knowing those choices — and having the tools to override them — is the difference between passive diff reading and active code understanding. Every senior engineer should know this; almost none of them do.

Try it yourself

Diff Checker — compare two texts online →