← Back to Blog
·9 min read

JSON vs YAML: Stop Picking the Wrong One for the Wrong Job

The JSON vs YAML debate feels perpetual, but it was settled years ago. The problem is that most teams are still applying the conclusion incorrectly. YAML for human-written configuration. JSON for machine-generated data. Everything else is a smell. And YAML has some genuinely terrible design decisions that you need to know about before you trust it with production configuration.

The simple rule, and why teams ignore it

Here is the rule: if a human will write or edit the file more often than a machine will generate it, use YAML. If a machine generates it and a human reads it occasionally, use JSON. This maps naturally to their strengths — YAML's indentation-based syntax and comment support make it readable for complex nested configuration; JSON's strictness and universal parser support make it reliable for data interchange.

The rule gets ignored because tools made the decision for teams. Kubernetes chose YAML. GitHub Actions chose YAML. Ansible chose YAML. And so those tools' users write YAML, and then they write YAML for everything else because it is what they know, including API responses, database fixtures, and inter-service communication — places where JSON's strictness is a feature, not an obstacle.

Meanwhile, some teams use JSON for everything, including human-authored configuration, and then they fight the lack of comments by abusing "_comment" keys and explaining their settings in separate README files. That is also wrong.

The Norway problem: YAML's implicit type coercion will hurt you

This is the one that bites Kubernetes cluster operators, CI pipeline authors, and Ansible playbook writers with regularity. YAML 1.1 (which is what most YAML parsers implement, despite YAML 1.2 existing since 2009) performs implicit type coercion on unquoted scalars. Some examples of what YAML 1.1 parsers will do to your data:

# YAML 1.1 implicit coercion — all of these become non-string values

country: NO          # -> false (boolean) — the Norway problem
country: YES         # -> true
verbose: on          # -> true
verbose: off         # -> false
version: 1.0         # -> float 1.0 (not string "1.0")
port: 8080           # -> integer 8080
octal: 010           # -> integer 8 (octal literal!)
hex: 0xFF            # -> integer 255

# The fix: quote values that look like other types
country: "NO"
version: "1.0"
port: "8080"

The "Norway problem" is named for the ISO 3166-1 alpha-2 country code for Norway (NO), which YAML 1.1 silently converts to a boolean false. Software that processes country codes, status flags with names like YES/NO/ON/OFF, or port numbers as strings is constantly at risk of this. The fix is quoting, but the problem is that you have to know to quote — YAML gives no warning that it performed coercion.

YAML 1.2 fixed most of this: only true and false are booleans, octal literals require the 0o prefix. But most parsers still default to 1.1 behavior for compatibility with a decade of existing YAML files. Check which spec version your parser implements before trusting it with unquoted values.

The twelve ways to write a YAML string

JSON has exactly one way to write a string: double-quoted, with escape sequences for special characters. YAML has at least six distinct string literal syntaxes, with different behavior for newlines, trailing whitespace, and escape sequences:

# All of these are valid YAML strings with subtly different behavior:

bare: plain scalar string (no quotes needed if no special chars)
single: 'single quoted — backslash is literal: 
 is two chars'
double: "double quoted — backslash works: 
 is a newline"
literal: |
  literal block scalar
  preserves newlines
  trailing newline included
folded: >
  folded block scalar
  newlines become spaces
  except blank lines
chomped: |-
  literal with chomping indicator
  no trailing newline

This is not a feature. This is accidental complexity that makes YAML files subtly inconsistent across authors. When you have five engineers writing Kubernetes manifests, you will have five different string literal styles in the same repository. Code review becomes a debate about which quoting style is correct rather than whether the configuration is correct. JSON's single string syntax eliminates this entire class of inconsistency.

Significant whitespace: the tab problem and the indentation problem

YAML uses indentation to represent structure, which means whitespace is semantically significant. Tabs are forbidden as indentation (JSON allows them as insignificant whitespace anywhere). Mixing two-space and four-space indentation within a document can produce subtly wrong structure without a parse error — the content is valid YAML, just not what you intended. A stray space before a key can silently move it to the wrong level in the hierarchy.

# These parse to COMPLETELY different structures:

# Object with two keys:
database:
  host: localhost
  port: 5432

# Object with one key 'database' = null, plus two top-level keys:
database:
host: localhost
port: 5432

# A common mistake in GitHub Actions:
steps:
  - name: Run tests
    run: npm test
     env:           # Extra space — env is now part of 'run', not 'step'
       CI: true

Where JSON's strictness is genuinely a feature

Everything that makes JSON annoying for humans — no comments, no trailing commas, mandatory double quotes, mandatory commas between elements — makes JSON reliable for machines. A JSON document means exactly one thing. There is no parser ambiguity, no version differences in coercion behavior, no whitespace sensitivity. You can diff two JSON files and know that any difference in the output corresponds to a difference in the data, not in how one parser interpreted a string literal differently from another.

This matters enormously for data pipelines, API responses, configuration that is generated by code rather than written by hand, and anywhere that downstream consumers must agree on the exact content of a document. JSON's constraints are not limitations — they are the guarantee that the format actually works as a universal interchange format.

The verdict: a concrete decision matrix

Use YAML for:

  • CI/CD pipeline configuration (GitHub Actions, GitLab CI, CircleCI)
  • Kubernetes and Helm chart manifests
  • Ansible playbooks and inventory files
  • Docker Compose files
  • Any configuration file where humans write comments explaining why settings are set

Use JSON for:

  • API request and response bodies
  • Machine-generated configuration (terraform.tfstate, package-lock.json)
  • Database fixtures and test data
  • Inter-service messages and event payloads
  • Any data written by code and read by code

If your YAML file will be touched by humans more often than it is generated by machines, and you want comments, use YAML. If machines write it and other machines read it, use JSON. And if you are making a new API in 2026: JSON, always JSON, unless you have a specific reason that requires binary efficiency (in which case you should be looking at MessagePack or Protocol Buffers, not YAML).

Published May 21, 2026 · By the utili.dev Team