TOON vs JSON: Byte-Level Efficiency Model
A mathematical analysis of TOON's byte efficiency compared to JSON across different data structures.
Scope of This Document
This page presents a theoretical, character-based comparison between TOON and JSON. For practical benchmarks and token counts, see Benchmarks. It is an advanced, non-normative reference: it explains TOON's design from a mathematical angle but does not change the TOON specification.
Overview
Large Language Models increasingly rely on structured data for inference and function calling. However, standard formats like JSON introduce significant verbosity that inflates token usage and inference costs. This analysis presents a formal mathematical comparison between TOON and JSON to evaluate whether TOON achieves quantifiable efficiency gains by eliminating structural redundancy.
Under the assumptions described below (compact JSON, canonical TOON, ASCII keys and punctuation, shallow to moderate nesting, and mostly unquoted TOON strings), TOON's structural overhead is lower than compact JSON for the structure families analyzed here, except arrays of arrays.
Key Findings
- Tabular arrays represent TOON's optimal use case, with efficiency gains scaling linearly with both row count and field count.
- Simple objects and primitive arrays show consistent byte reduction, with savings proportional to the number of fields or elements.
- Nested objects benefit from reduced overhead, though efficiency decreases with depth due to indentation costs; at sufficient depth, compact JSON can become smaller.
- Arrays of arrays are the only structure where TOON is less efficient than JSON in this analysis, due to TOON's explicit list markers and inner array headers.
Methodology
We define recursive byte-length functions
Where
Scope & Assumptions
- Compact JSON: JSON is assumed to be compact (no spaces or newlines outside strings). Byte counts are computed on this compact form.
- Canonical TOON: TOON is assumed to follow canonical formatting (indent = 2 spaces, exactly one space after
:, no spaces after commas in arrays/field lists, no trailing spaces). - Keys and strings: All keys are "simple" ASCII identifier-style keys that:
- must be quoted in JSON, and
- can be left unquoted in TOON (no characters that would force quoting). Many examples assume values are numbers, booleans, null, or TOON-safe strings that can be unquoted in TOON but must be quoted in JSON.
- Numbers: Both formats are assumed to use the same canonical decimal representation (no exponent notation), matching TOON's requirement. JSON could use exponent forms; we ignore that here to isolate structural differences.
- ASCII/UTF-8: Keys and structural tokens are assumed ASCII, so byte length equals character count (
). Non-ASCII content affects both formats similarly and does not change the structural conclusions. - Nesting depth: Closed-form expressions are given for flat structures and a single level of nesting. Each additional nesting level in TOON adds 2 bytes of indentation per nested line. At sufficient depth, the braces of compact JSON can win over TOON's indentation (as seen in When Not to Use TOON).
- Byte vs token count: Modern LLM tokenizers operate over UTF-8 bytes, so byte length is a good upper bound and first-order proxy for token count, even though the mapping is not exactly linear.
Think of this as a simplified structural model: we strip away real-world noise and ask, "if you only count structural characters, how do JSON and TOON compare?"
Formal Notation
Data Model
Let
Let
Let
Where:
is a key (string) can be a primitive value , an object , or an array
Therefore:
String Length
Let
Integer Length
Let
JSON Size Functions
For a flat object of
Where
Primitive Values in JSON
When
| Type | Formula |
|---|---|
| String | |
| Number | |
| Boolean | |
| Null |
Arrays in JSON
When
TOON Size Functions
For a flat object of
Where
Primitive Values in TOON
When
| Type | Formula |
|---|---|
| String (normal) | |
| String (looks like number/boolean) | |
| Number | |
| Boolean | |
| Null |
Simple Arrays in TOON
Here key[N]: ..., not just the array value.
When
Tabular Arrays in TOON
When
Note: The term
Efficiency Analysis by Structure
Each subsection below focuses on a particular structure family, states the resulting formula, and shows a small example. Intuitively, TOON tends to win when it can:
- avoid repeating keys (tabular arrays),
- avoid quoting keys and many values,
- and replace braces with indentation,
and tends to lose when it pays a fixed overhead per element (arrays of arrays) or deep indentation (heavily nested configs).
Simple Objects
Flat objects with primitive string values are the easiest win: JSON pays for braces and quoted keys and strings, while TOON drops braces at the root, omits quotes on simple keys, and uses one line per field.
For objects with only string primitives:
If all values are strings that can be unquoted in TOON, this simplifies to:
Example: For 1,000,000 objects, TOON saves 3,000,002 bytes ≈ 2.86 MB.
Empirical Validation
json
{ "id": 1, "name": "Ada" }Nested Objects
Adding a wrapper object (one extra level of nesting) introduces extra braces for JSON and extra indentation and newlines for TOON. For a single level of nesting with primitive values, TOON still comes out ahead, but the net advantage is smaller.
For a single level of nesting with primitives:
Example: For 1,000,000 nested objects (depth 1), TOON saves 1,000,005 bytes ≈ 0.95 MB.
Caveat
This formula is for a single nesting level. Each additional nesting level adds 2 spaces of indentation per nested line; at sufficient depth, compact JSON can become smaller, especially when tabular opportunities disappear (see When Not to Use TOON and the "Deeply nested configuration" dataset in Benchmarks).
Empirical Validation
json
{ "user": { "id": 1, "name": "Ada" } }yaml
user:
id: 1
name: AdaPrimitive Arrays
For arrays of string primitives, JSON writes ["foo","bar","baz"], quoting every string and using [] for the array. TOON writes key[N]: foo,bar,baz, paying once for the length marker but omitting most quotes.
For arrays of
With string values that can be unquoted in TOON, this simplifies to:
Example: For 1,000,000 elements, TOON saves 1,999,996 bytes ≈ 1.91 MB.
Empirical Validation
json
{ "tags": ["foo", "bar", "baz"] }Root Arrays
At the root, JSON writes ["x","y","z"]; TOON writes [3]: x,y,z. There is no object key cost, so the advantage mainly comes from not quoting TOON-safe strings and from replacing [] with [N]:.
For root-level arrays of
Example: For 1,000,000 elements, TOON saves 1,999,991 bytes ≈ 1.91 MB.
Empirical Validation
Tabular Arrays
Uniform arrays of objects are TOON's sweet spot. JSON repeats every key for every row, while TOON declares the length and column names once (key[N]{id,qty,...}:) and streams rows as bare values.
For arrays of objects with
Example: For 1,000,000 rows with 2 fields and 3-character field names, TOON saves 11,999,987 bytes ≈ 11.44 MB.
This is where TOON's design (declare fields once, stream rows) pays off most strongly: savings grow linearly with both row count and field count.
Empirical Validation
json
{ "items": [{ "id": 1, "qty": 5 }, { "id": 2, "qty": 3 }] }yaml
items[2]{id,qty}:
1,5
2,3Arrays of Arrays
Arrays of arrays of primitives are where TOON structurally loses: each inner array becomes a list item with its own header, so TOON pays a fixed overhead per inner array ("- " plus "[m]: "), while JSON just uses commas.
Practical Note
For arrays of arrays of primitives, this model predicts that JSON is more byte-efficient than TOON, because TOON pays ~6 extra bytes per inner array (2 for "- ", 4 for "[m]: "), plus the length marker.
For arrays of arrays with
With string primitives and
Example: For 1,000,000 arrays with
Empirical Validation
json
{ "pairs": [[1, 2], [3, 4]] }yaml
pairs[2]:
- [2]: 1,2
- [2]: 3,4Strings That Look Like Literals
Strings that look like numbers or booleans (e.g. "123", "true") must be quoted in both JSON and TOON, slightly reducing TOON's advantage because it no longer saves quotes on those values.
For objects containing such strings:
Example: For 1,000,000 objects, TOON saves 2,000,002 bytes ≈ 1.91 MB.
Empirical Validation
json
{ "version": "123", "enabled": "true" }yaml
version: "123"
enabled: "true"Empty Structures
Empty containers reveal structural differences even at minimal sizes.
Empty Object:
JSON requires {} (2 bytes), whereas a completely empty root object in TOON is represented as an empty document (0 bytes).
Empty Array (field):
For a field named key, JSON uses {"key":[]} in compact form, while TOON uses:
Under this model, that yields a constant 3-byte advantage for TOON.
Summary Table
The table below summarizes the formulas and which side wins under the modeling assumptions.
| Structure | Efficiency Formula | TOON Advantage? |
|---|---|---|
| Simple Objects | ✅ Yes | |
| Nested Objects (1 level) | ✅ Yes (shrinks with depth) | |
| Primitive Arrays | ✅ Yes | |
| Root Arrays | ✅ Yes | |
| Tabular Arrays | ✅ Best case | |
| Arrays of Arrays | ❌ JSON wins here | |
| String Literals | ✅ Yes (smaller gain) | |
| Empty Structures | ✅ Yes |
In short:
- TOON's gains are linear in the number of fields for flat objects.
- For arrays, gains grow linearly in the number of elements, and for tabular arrays linearly in both rows and fields.
- Arrays of arrays are the main structural case where JSON is smaller.
- Deep nesting and heavy quoting can erode or reverse these advantages in real data.
Conclusion
This simplified theoretical model supports TOON's design goal: structurally, it reduces overhead compared to compact JSON in many common patterns by:
- avoiding repeated keys in tabular arrays,
- omitting quotes on many keys and values,
- and replacing braces with indentation at shallow depths.
For the structure families examined here and under the stated assumptions, the structural overhead of TOON is lower than that of compact JSON except for arrays of arrays. Since UTF-8 byte length is a reasonable first-order proxy for tokens, these structural savings usually translate into lower token counts in those patterns.
At the same time, this is deliberately a simplified model. In real datasets, additional factors – deeper or irregular nesting, heavily quoted strings, exponent notation in JSON, and tokenizer idiosyncrasies – can reduce or even reverse these gains. Our Benchmarks and When Not to Use TOON show that compact JSON can be more efficient for deeply nested or low-tabularity data. Use this page as intuition for why TOON behaves the way it does, not as a universal guarantee.
- Benchmarks – Empirical token count and accuracy comparisons across formats
- Specification – Formal TOON specification
References
This analysis is based on:
- Original Research: TOON vs. JSON: A Mathematical Evaluation of Byte Efficiency in Structured Data
- TOON Specification: toon-format/spec
- JSON Specification: RFC 8259, ECMA-404
This page was contributed by Mateo Lafalce (@mateolafalce).
Have questions or found an error in the formalization? Open an issue on GitHub or contribute improvements to this analysis.