TOON

10 min read Original article ↗

TOON vs JSON: Byte-Level Efficiency Model

A mathematical analysis of TOON's byte efficiency compared to JSON across different data structures.

Scope of This Document

This page presents a theoretical, character-based comparison between TOON and JSON. For practical benchmarks and token counts, see Benchmarks. It is an advanced, non-normative reference: it explains TOON's design from a mathematical angle but does not change the TOON specification.

Overview

Large Language Models increasingly rely on structured data for inference and function calling. However, standard formats like JSON introduce significant verbosity that inflates token usage and inference costs. This analysis presents a formal mathematical comparison between TOON and JSON to evaluate whether TOON achieves quantifiable efficiency gains by eliminating structural redundancy.

Under the assumptions described below (compact JSON, canonical TOON, ASCII keys and punctuation, shallow to moderate nesting, and mostly unquoted TOON strings), TOON's structural overhead is lower than compact JSON for the structure families analyzed here, except arrays of arrays.

Key Findings

  • Tabular arrays represent TOON's optimal use case, with efficiency gains scaling linearly with both row count and field count.
  • Simple objects and primitive arrays show consistent byte reduction, with savings proportional to the number of fields or elements.
  • Nested objects benefit from reduced overhead, though efficiency decreases with depth due to indentation costs; at sufficient depth, compact JSON can become smaller.
  • Arrays of arrays are the only structure where TOON is less efficient than JSON in this analysis, due to TOON's explicit list markers and inner array headers.

Methodology

We define recursive byte-length functions Ljson and Ltoon for both formats, then derive the efficiency delta:

Δ=Ljson(Ω)Ltoon(Ω)

Where Ω represents the data structure under comparison. If Δ>0, TOON uses fewer bytes than JSON for that structure.

Scope & Assumptions

  • Compact JSON: JSON is assumed to be compact (no spaces or newlines outside strings). Byte counts are computed on this compact form.
  • Canonical TOON: TOON is assumed to follow canonical formatting (indent = 2 spaces, exactly one space after :, no spaces after commas in arrays/field lists, no trailing spaces).
  • Keys and strings: All keys are "simple" ASCII identifier-style keys that:
    • must be quoted in JSON, and
    • can be left unquoted in TOON (no characters that would force quoting). Many examples assume values are numbers, booleans, null, or TOON-safe strings that can be unquoted in TOON but must be quoted in JSON.
  • Numbers: Both formats are assumed to use the same canonical decimal representation (no exponent notation), matching TOON's requirement. JSON could use exponent forms; we ignore that here to isolate structural differences.
  • ASCII/UTF-8: Keys and structural tokens are assumed ASCII, so byte length equals character count (|x|utf8=|x|char). Non-ASCII content affects both formats similarly and does not change the structural conclusions.
  • Nesting depth: Closed-form expressions are given for flat structures and a single level of nesting. Each additional nesting level in TOON adds 2 bytes of indentation per nested line. At sufficient depth, the braces of compact JSON can win over TOON's indentation (as seen in When Not to Use TOON).
  • Byte vs token count: Modern LLM tokenizers operate over UTF-8 bytes, so byte length is a good upper bound and first-order proxy for token count, even though the mapping is not exactly linear.

Think of this as a simplified structural model: we strip away real-world noise and ask, "if you only count structural characters, how do JSON and TOON compare?"

Formal Notation

Data Model

Let ω be a primitive value such that ω{string, number, boolean, null}.

Let O be an object composed of n key-value pairs:

O={(k1,v1),(k2,v2),,(kn,vn)}

Let A be an array composed of n elements:

A={v1,v2,,vn}

Where:

  • ki is a key (string)
  • vi can be a primitive value ω, an object O, or an array A

Therefore: vi{ω,O,A}

String Length

Let S be the set of valid Unicode strings. For any string xS, we denote |x|utf8 as the byte-length of x under UTF-8 encoding.

Integer Length

Let nZ0 be a non-negative integer. The number of bytes required to represent n in decimal format is:

Lnum(n)={1if n=0log10(|n|)+1if n>0

JSON Size Functions

For a flat object of n keys:

Ljson(O)=2{}+i=1n(Lstr(ki)+1:+Ljson(vi))+(n1)commas

Where Lstr(k) is the length of the key including its mandatory quotes:

Lstr(k)=|k|utf8+2quotes

Primitive Values in JSON

When vi is a primitive data type ω:

TypeFormula
StringLstr(vi)=|vi|utf8+2
NumberLnum(vi)=|vi|utf8
BooleanLbool(vi)=|vi|utf8
NullLnull(vi)=|vi|utf8

Arrays in JSON

When vi is an array A:

Ljson(A)=2[]+i=1nLjson(vi)+(n1)commas

TOON Size Functions

For a flat object of n keys:

Ltoon(O)=i=1n(Lstr(ki)+1:+1space+Ltoon(vi))+(n1)newlines

Where Lstr(k) is the length of the key (no quotes required for simple keys):

Lstr(k)=|k|utf8

Primitive Values in TOON

When vi is a primitive data type ω:

TypeFormula
String (normal)Lstr(vi)=|vi|utf8
String (looks like number/boolean)Lstr(vi)=|vi|utf8+2
NumberLnum(vi)=|vi|utf8
BooleanLbool(vi)=|vi|utf8
NullLnull(vi)=|vi|utf8

Simple Arrays in TOON

Here Ltoon(A) refers to the length of the whole field line key[N]: ..., not just the array value.

When vi is a simple array A:

Ltoon(A)=Lstr(ki)+1[+Lnum(n)+1]+1:+1space+i=1nLtoon(vi)+(n1)commas

Tabular Arrays in TOON

When vi is an array of objects with m fields:

Ltoon(A)=Lstr(ki)+1[+Lnum(n)+1]+1{+i=1mLstr(ki)+(m1)commas+1}+1:+2nindents+i=1nj=1mLtoon(vij)+(m1)ncommas+nnewlines

Note: The term 2n assumes an indentation size of 2 spaces.

Efficiency Analysis by Structure

Each subsection below focuses on a particular structure family, states the resulting formula, and shows a small example. Intuitively, TOON tends to win when it can:

  • avoid repeating keys (tabular arrays),
  • avoid quoting keys and many values,
  • and replace braces with indentation,

and tends to lose when it pays a fixed overhead per element (arrays of arrays) or deep indentation (heavily nested configs).

Simple Objects

Flat objects with primitive string values are the easiest win: JSON pays for braces and quoted keys and strings, while TOON drops braces at the root, omits quotes on simple keys, and uses one line per field.

For objects with only string primitives:

Δobj=2+n+i=1n(Ljson(vi))i=1n(Ltoon(vi))

If all values are strings that can be unquoted in TOON, this simplifies to:

f(n)=2+3n

Example: For 1,000,000 objects, TOON saves 3,000,002 bytes ≈ 2.86 MB.

Empirical Validation

json

{ "id": 1, "name": "Ada" }
Δobj=2+2n+6Ljson(vi)4Ltoon(vi)=6

Nested Objects

Adding a wrapper object (one extra level of nesting) introduces extra braces for JSON and extra indentation and newlines for TOON. For a single level of nesting with primitive values, TOON still comes out ahead, but the net advantage is smaller.

For a single level of nesting with primitives:

f(n)=5+n

Example: For 1,000,000 nested objects (depth 1), TOON saves 1,000,005 bytes ≈ 0.95 MB.

Caveat

This formula is for a single nesting level. Each additional nesting level adds 2 spaces of indentation per nested line; at sufficient depth, compact JSON can become smaller, especially when tabular opportunities disappear (see When Not to Use TOON and the "Deeply nested configuration" dataset in Benchmarks).

Empirical Validation

json

{ "user": { "id": 1, "name": "Ada" } }

yaml

user:
  id: 1
  name: Ada
Δnested=5

Primitive Arrays

For arrays of string primitives, JSON writes ["foo","bar","baz"], quoting every string and using [] for the array. TOON writes key[N]: foo,bar,baz, paying once for the length marker but omitting most quotes.

For arrays of n string primitives:

Δarr=3Lnum(n)+i=1n(Ljson(vi))i=1n(Ltoon(vi))

With string values that can be unquoted in TOON, this simplifies to:

f(n)=2+2nlog10(|n|)

Example: For 1,000,000 elements, TOON saves 1,999,996 bytes ≈ 1.91 MB.

Empirical Validation

json

{ "tags": ["foo", "bar", "baz"] }
Δarr=31Lnum(3)+15Ljson9Ltoon=8

Root Arrays

At the root, JSON writes ["x","y","z"]; TOON writes [3]: x,y,z. There is no object key cost, so the advantage mainly comes from not quoting TOON-safe strings and from replacing [] with [N]:.

For root-level arrays of n string primitives:

f(n)=3+2nlog10(|n|)

Example: For 1,000,000 elements, TOON saves 1,999,991 bytes ≈ 1.91 MB.

Empirical Validation

Δroot=9Ljson21Lnum(3)3Ltoon=3

Tabular Arrays

Uniform arrays of objects are TOON's sweet spot. JSON repeats every key for every row, while TOON declares the length and column names once (key[N]{id,qty,...}:) and streams rows as bare values.

For arrays of objects with n rows and m fields, assuming numeric values and |k|=3:

f(n)=1+nm(3+|k|)m(1+|k|)log10(|n|)

Example: For 1,000,000 rows with 2 fields and 3-character field names, TOON saves 11,999,987 bytes ≈ 11.44 MB.

This is where TOON's design (declare fields once, stream rows) pays off most strongly: savings grow linearly with both row count and field count.

Empirical Validation

json

{ "items": [{ "id": 1, "qty": 5 }, { "id": 2, "qty": 3 }] }

yaml

items[2]{id,qty}:
  1,5
  2,3
Δtab=2+4nm2m+22ΣLjson1Lnum(n)5ΣLtoon(k)4ΣLtoon(v)=16

Arrays of Arrays

Arrays of arrays of primitives are where TOON structurally loses: each inner array becomes a list item with its own header, so TOON pays a fixed overhead per inner array ("- " plus "[m]: "), while JSON just uses commas.

Practical Note

For arrays of arrays of primitives, this model predicts that JSON is more byte-efficient than TOON, because TOON pays ~6 extra bytes per inner array (2 for "- ", 4 for "[m]: "), plus the length marker.

For arrays of arrays with n outer elements and m inner elements:

Δarrarr=26ni=1nj=1mLnum(m)+i=1nj=1mLjson(vij)i=1nj=1mLtoon(vij)

With string primitives and m=2:

f(n)=26ni=1nj=1m(log10(|m|)+1)+2nm

Example: For 1,000,000 arrays with m=2, TOON wastes 2,999,998 bytes ≈ 2.86 MB relative to JSON under this model.

Empirical Validation

json

{ "pairs": [[1, 2], [3, 4]] }

yaml

pairs[2]:
  - [2]: 1,2
  - [2]: 3,4
Δarrarr=2126n2Lnum(m)+4Ljson4Ltoon=12

Strings That Look Like Literals

Strings that look like numbers or booleans (e.g. "123", "true") must be quoted in both JSON and TOON, slightly reducing TOON's advantage because it no longer saves quotes on those values.

For objects containing such strings:

Δstrlit=2+n

Example: For 1,000,000 objects, TOON saves 2,000,002 bytes ≈ 1.91 MB.

Empirical Validation

json

{ "version": "123", "enabled": "true" }

yaml

version: "123"
enabled: "true"
Δstr=2+2n=4

Empty Structures

Empty containers reveal structural differences even at minimal sizes.

Empty Object:

ΔEmptyObject=2

JSON requires {} (2 bytes), whereas a completely empty root object in TOON is represented as an empty document (0 bytes).

Empty Array (field):

ΔEmptyArray=3

For a field named key, JSON uses {"key":[]} in compact form, while TOON uses:

Under this model, that yields a constant 3-byte advantage for TOON.

Summary Table

The table below summarizes the formulas and which side wins under the modeling assumptions.

StructureEfficiency FormulaTOON Advantage?
Simple Objectsf(n)=2+3n✅ Yes
Nested Objects (1 level)f(n)=5+n✅ Yes (shrinks with depth)
Primitive Arraysf(n)=2+2nlog10(n)✅ Yes
Root Arraysf(n)=3+2nlog10(n)✅ Yes
Tabular Arraysf(n)=1+nm(3+|k|)m(1+|k|)log10(n)Best case
Arrays of Arraysf(n)=26n+2nmoverhead❌ JSON wins here
String Literalsf(n)=2+n✅ Yes (smaller gain)
Empty StructuresΔ=2 or 3✅ Yes

In short:

  • TOON's gains are linear in the number of fields for flat objects.
  • For arrays, gains grow linearly in the number of elements, and for tabular arrays linearly in both rows and fields.
  • Arrays of arrays are the main structural case where JSON is smaller.
  • Deep nesting and heavy quoting can erode or reverse these advantages in real data.

Conclusion

This simplified theoretical model supports TOON's design goal: structurally, it reduces overhead compared to compact JSON in many common patterns by:

  • avoiding repeated keys in tabular arrays,
  • omitting quotes on many keys and values,
  • and replacing braces with indentation at shallow depths.

For the structure families examined here and under the stated assumptions, the structural overhead of TOON is lower than that of compact JSON except for arrays of arrays. Since UTF-8 byte length is a reasonable first-order proxy for tokens, these structural savings usually translate into lower token counts in those patterns.

At the same time, this is deliberately a simplified model. In real datasets, additional factors – deeper or irregular nesting, heavily quoted strings, exponent notation in JSON, and tokenizer idiosyncrasies – can reduce or even reverse these gains. Our Benchmarks and When Not to Use TOON show that compact JSON can be more efficient for deeply nested or low-tabularity data. Use this page as intuition for why TOON behaves the way it does, not as a universal guarantee.

  • Benchmarks – Empirical token count and accuracy comparisons across formats
  • Specification – Formal TOON specification

References

This analysis is based on:


This page was contributed by Mateo Lafalce (@mateolafalce).

Have questions or found an error in the formalization? Open an issue on GitHub or contribute improvements to this analysis.