ClickHouse Native JSON Support in 2026: A PR-by-PR Analysis

ClickHouse has full native JSON support, and has since v25.3. The JSON type stores each path as a separate columnar subcolumn with native type preservation, primary key indexing, and selective path reads. It is 2,500x faster than MongoDB for aggregations on the JSONBench 1-billion-document benchmark. The narrative that “ClickHouse can’t do JSON” is outdated by two years and 80+ merged PRs.

We analyzed 80+ GitHub pull requests, official ClickHouse changelogs, release blogs, and third-party benchmarks to trace the full evolution of JSON support from string-based functions through the modern native JSON type.
In 2021, the criticism had some basis. JSON was stored as opaque String blobs, queried via JSONExtract* functions that required full column scans on every query. The experimental Object(’json’) type shipped in 2022 but suffered from eager type unification, unbounded column explosion, and race conditions.
By early 2026, ClickHouse ships a production-ready native JSON type built on three foundational types (Variant, Dynamic, JSON), with configurable path limits, type hints, primary key support for JSON subcolumns, three generations of storage serialization, and a query planner that reads only the specific JSON paths your query needs. None of this requires manual schema management.
The single highest-impact storage change is advanced shared data serialization (PR #83777), which delivered 58x faster reads and 3,300x less memory for selective path access by introducing per-granule metadata with path indexes.
The native JSON type stores each path as a separate Dynamic-typed subcolumn in columnar format. The result: 2,500x faster than MongoDB for aggregations, 10x faster than Elasticsearch, and 9,000x faster than DuckDB/PostgreSQL for analytics on the same dataset, according to the JSONBench benchmark on 1 billion Bluesky documents.
The JSON type reached GA in ClickHouse 25.3 (PR #77785), with experimental flags removed and the type backported to the LTS release. The legacy Object(’json’) type was fully removed in v25.11 (PR #85718).
Verdict: the “ClickHouse doesn’t do JSON” advice referenced a system that no longer exists. The current JSON type is a ground-up columnar implementation that preserves native types, supports primary key indexing, and reads only the paths you query. Repeating the old criticism in 2026 is misinformation.

If you’ve evaluated ClickHouse for semi-structured data, you’ve heard the warnings:

“ClickHouse doesn’t support JSON natively”
“Flatten JSON into columns manually”
“Use JSONExtract functions on String columns” (as the primary approach)
“Use Object(’JSON’)” (deprecated type)
“No native JSON support”

Some of these started as legitimate observations circa 2021-2022. ClickHouse did store JSON as Strings. The JSONExtract* functions did scan the full column. The first attempt at a native type Object(’json’)) did have serious architectural flaws.

Others were amplified by competitors who found a convenient story: ClickHouse is fast for scans, but it can’t handle semi-structured data.

Then ClickHouse’s engineering team spent three years building one of the most sophisticated columnar JSON implementations in any database. Over 80 significant pull requests merged. They built three new foundational types (Variant, Dynamic, JSON), three generations of storage serialization, a query planner that reads only needed subcolumns, primary key and skip index support for JSON paths, and clear migration paths from every legacy representation.

This article traces that evolution with PR-level evidence. No marketing claims. No benchmarks on toy datasets. Just the commit history.

We went through ClickHouse’s GitHub commit history, pull requests, changelogs, and release blogs from 2019 through early 2026. The scope covered every PR that touched JSON handling: type implementations, storage formats, function changes, planner optimizations, memory improvements, correctness fixes, and migration paths.

Each PR was classified by category (type system, storage, functions, planner, correctness, migration), impact severity, and whether it changed default behavior. We cross-referenced PR descriptions against changelog entries and benchmark results to verify claimed improvements. Where multiple PRs addressed the same subsystem, we traced the dependency chain to understand how incremental changes compounded.

The result is a ranked analysis of 80+ pull requests organized into six phases, with full provenance. Every claim in this article maps to a specific merged PR that you can verify yourself on GitHub.

The current state, as of early 2026:

Native JSON data type (GA since v25.3): Each JSON path is stored as a separate Dynamic-typed subcolumn in columnar format. Full SQL query, filter, and aggregation support on JSON fields, including nested structures and arrays Array(JSON)). Configurable max_dynamic_paths (default 1024) and max_dynamic_types (default 32) control resource usage. Known paths can be materialized as physical columns with type hints JSON(key1 UInt32, key2 String)), while unknown paths are automatically discovered with type inference. Path filtering via SKIP and SKIP REGEXP provides fine-grained schema control.
Three foundational types: Variant (PR #58047) provides discriminated unions. Dynamic (PR #63058) extends Variant with open-ended type storage. JSON (PR #66444) combines both to store semi-structured data with native type preservation.
Primary key and skip index support: JSON subcolumns can appear in ORDER BY and data-skipping index expressions (PR #72644), enabling the same data pruning that ClickHouse applies to regular typed columns.
Advanced shared data serialization: Per-granule path indexes for selective reads of specific paths without scanning the entire JSON column (PR #83777). Three serialization modes optimized for different access patterns.
Planner-level subcolumn optimization: The query planner reads only the JSON paths referenced in your query (PR #68053), pushes subcolumn requirements through CTEs and views (PR #94105), and rewrites JSONExtract calls into direct subcolumn reads (PR #96711).
Full JSONExtract interop: All JSONExtract* functions work with native JSON columns (PR #96711). Introspection functions distinctJSONPaths, distinctJSONPathsAndTypes) provide schema discovery from metadata alone (PR #68463, PR #92196).
Migration from every legacy format: ALTER TABLE ... MODIFY COLUMN converts String, Object(’json’), Map, and Tuple columns to the native JSON type (PR #70442, PR #71784, PR #71320).

These are not experimental features behind flags. They are defaults that ship with every ClickHouse installation since v25.3.

The FUD: “Use JSONExtract functions on String columns”

In this era, the criticism was fair. ClickHouse stored JSON as opaque String blobs, and every JSON query required parsing the entire string value.

PR #5235 introduced the JSONExtract* function family, powered by simdjson with a RapidJSON fallback. This was a meaningful step: SIMD instructions allowed structural element identification at near-memory-bandwidth speeds.

But the fundamental limitation remained. Every query, no matter which field it accessed, required scanning and parsing the full JSON string column. There was no way to read just event.user_id without also reading event.metadata, event.payload, and every other field.

ClickHouse provided two function families with different trade-offs:

simpleJSON / visitParam: Minimalist heuristic parsing with low CPU overhead, but strict assumptions about canonical encoding and no nested object support.
JSONExtract*: Full simdjson-powered parsing with standards-compliant extraction, but high per-row CPU cost from full document parsing.

Neither approach could avoid the core problem: 100% column scan for every query.

PR #24148 added JSON_VALUE, JSON_QUERY, and JSON_EXISTS with JSONPath expression support, bringing ClickHouse closer to SQL/JSON standard compliance. This improved SQL compatibility but did not change the underlying storage model. JSON was still strings.

The Map(String, String) type offered some improvement by storing JSON key-value pairs natively, eliminating the need for string parsing on every access. But it still required reading all keys to find one entry, and it lost all type information by collapsing everything to strings.

By the end of 2021, ClickHouse had capable JSON parsing functions but no native JSON storage. The gap was real, and the engineering team knew it.

The FUD: “Use Object(’JSON’)”

PR #23932, merged March 2022 by Anton Popov, was the first attempt at native columnar JSON storage. It shipped in ClickHouse 22.3 LTS under allow_experimental_object_type. The implementation spanned 101 commits and proved a critical concept: JSON could be stored with each path as a separate subcolumn.

But it had serious architectural flaws:

Despite these flaws, Object(’json’) validated the demand for native JSON storage and identified every architectural challenge the replacement would need to solve.

Alongside the type work, ClickHouse continued improving JSON ecosystem support. PR #40910 introduced the JSONObjectEachRow format for keyed JSON objects. PR #39186 added automatic type inference from JSON strings, detecting dates, datetimes, and integers by default. PR #54427 enabled schema inference of JSON objects as named Tuples.

These format and inference improvements meant ClickHouse was getting better at ingesting JSON data. What it still lacked was a sound way to store it.

The FUD: “JSON types lose type information”

Rather than patching Object(’json’), ClickHouse built from first principles. The redesign started with two new foundational types that solved the type-preservation problem that had plagued the original implementation.

PR #58047, by Pavel Kruglov, introduced Variant(T1, T2, ..., TN), a discriminated union storing values of different types in a single column. It uses a UInt8 discriminator column plus dense subcolumns per type variant, supporting up to 255 variants. The PR included 47 commits and roughly 5,000 lines of tests.

This solved the type-unification problem that killed Object(’json’). Instead of collapsing 42 and “hello” at the same path into String, Variant stores them in their native types with a discriminator indicating which type each row contains.

A follow-up optimization (PR #62774) introduced compact discriminator serialization: when all discriminators in a granule are the same type (the common case for JSON paths), it stores 3 values instead of 8,192. This is highly effective in practice since most JSON paths have homogeneous types within a granule.

PR #63058, also by Pavel Kruglov, extended Variant with an open, self-describing type set. Dynamic has a max_types parameter (default 32); the most frequent types get their own Variant slots, and overflow types are stored in a SharedVariant as binary-encoded strings. This provided the flexibility that JSON demands without the unbounded explosion that doomed Object(’json’).

The PR included 39 commits and introduced the dynamicType() introspection function. A dynamic_structure.bin metadata file per data part tracks the type composition.

These two types, Variant and Dynamic, were the architectural foundation. The JSON type would combine them both.

The FUD: “ClickHouse doesn’t support JSON natively”

PR #66444 is the single most important commit in ClickHouse’s JSON evolution. Authored by Pavel Kruglov, it implements the entirely new JSON data type in 91 commits, closing RFC #54864 (”Semistructured Columns”) authored by Alexey Milovidov.

The design works as follows. JSON paths are flattened into individual Dynamic-typed subcolumns, each stored in separate column files per data part. Paths exceeding max_dynamic_paths (default 1024) overflow into a shared data structure. The type supports:

Full SQL support: Query, filter, and aggregate on any JSON field using standard SQL. Nested structures and arrays are first-class citizens.
Configurable limits: max_dynamic_paths (default 1024) and max_dynamic_types (default 32) control resource usage
Materialized known paths: Type hints like JSON(key1 UInt32, key2 String) materialize known paths as physical typed columns for maximum performance, while unknown paths are automatically created with type inference as they are discovered
Path filtering: SKIP and SKIP REGEXP to exclude noisy paths from columnar storage
Dot-notation access: json.a.b for direct path reads
Sub-object access: json.^prefix for extracting JSON subtrees
Array(JSON) support: Nested structures and arrays of JSON documents
Efficient data skipping on dynamic paths: JSON subcolumns in primary keys and skip indexes enable granule-level pruning

First shipped in ClickHouse 24.8 LTS under allow_experimental_json_type. The official blog post “How we built a new powerful JSON data type for ClickHouse” (October 2024) detailed the architecture.

PR #69272 addressed a critical production concern: memory consumption during JSON inserts. Before this PR, inserting JSON data consumed 6.99 GiB of memory. After, 354 MiB. A 20x reduction.

The fix was adaptive write buffer sizing. Buffers start at 16 KiB and grow exponentially to a maximum of 1 MiB, selectively enabled for dynamic substreams. S3 inserts improved from 23.13 GiB to 7.65 GiB. No throughput regression.

PR #70442 delivered two major changes. First, ALTER TABLE ... MODIFY COLUMN col JSON to convert existing String columns to the JSON type. Conversion happens during background merges, so there is no downtime. Second, Serialization V2 for JSON and Dynamic types with an improved binary layout.

This was the beginning of clear migration paths. Teams no longer had to reimport data to adopt native JSON.

PR #66046 refactored the JSONExtract function family to work with the new type, splitting the implementation into reusable JSONExtractTree.h/cpp components and adding Dynamic type support. This ensured that existing queries using JSONExtract would continue to work when columns migrated to native JSON.

PR #68463 added distinctDynamicTypes(), distinctJSONPaths(), and distinctJSONPathsAndTypes(). These are essential schema discovery tools for semi-structured data. They were later optimized in PR #92196 to read only metadata files instead of scanning actual data, making schema diversity views effectively instant.

PR #68053 enabled optimize_functions_to_subcolumns by default. This planner optimization rewrites function calls to read only the specific subcolumns needed, which is transformative for JSON queries. A query accessing json.user.id reads only that subcolumn’s data, not the entire JSON column.

PR #72294 moved JSON, Dynamic, and Variant to beta status, backported to 24.11. This signaled production readiness for early adopters.

PR #72644 was a milestone for performance. It enabled JSON subcolumns json.path.to.key) in ORDER BY expressions and data-skipping index definitions. This means ClickHouse applies the same data pruning to JSON fields that it applies to regular typed columns.

The JSONBench benchmark uses this capability for sub-second queries over 1 billion documents. Without it, JSON columns could not participate in ClickHouse’s primary mechanism for reducing scan ranges.

By the end of 2024, clear migration routes existed for every semi-structured representation:

The FUD: “JSON queries require full column scans”

PR #77785, authored by Alexey Milovidov and expanded by Pavel Kruglov, removed all experimental and beta gates for JSON, Dynamic, and Variant. The commit message references https://jsonbench.com/. Backported to ClickHouse 25.3 LTS via cherry-pick PRs #77974 and #77975.

The 25.3 release blog stated: “About 1.5 years ago, we weren’t happy with our JSON implementation, so we returned to the drawing board.”

PR #77640 addressed memory consumption during read-ahead prefetches of JSON columns in Wide parts. Before: SELECT * WHERE y=1 on 1 million rows with 1,000 JSON paths consumed 69.16 GiB peak memory. After: 1.11 GiB. A 63x reduction.

PR #74827 introduced prefetches for subcolumn prefix deserialization, a cache for deserialized prefixes, and parallel prefix deserialization for JSON columns on S3. The result: 4x faster full scans and roughly 10x faster LIMIT 10 queries on remote storage. This introduced MergeTreePrefixesDeserializationThreadPool and benefits any remote filesystem with similar latency characteristics.

PR #83777 is the most impactful storage optimization in the JSON type’s history. It introduced three serialization modes for shared data (the overflow storage for paths beyond max_dynamic_paths):

The advanced mode creates per-granule .structure, .data, and .paths_marks files with a path index that enables direct lookup of specific paths without scanning the entire shared data structure.

The benchmarks speak for themselves. Reading a single key from 200,000 rows with 10,000 unique paths improved from 3.63s / 12.53 GiB to 0.063s / 3.89 MiB. That is 58x faster and 3,300x less memory. For Compact parts, non-existing key reads improved from 3.4s to 0.3s (roughly 11x faster), memory from 517 MiB to 3.7 MiB (roughly 140x reduction).

This PR contained 47 commits and is documented in the official ClickHouse blog “Making complex JSON 58x faster, use 3,300x less memory.”

PR #77940 added marks for individual substreams within compact parts, extending selective subcolumn read efficiency to the compact storage format. Previously, reading any subcolumn from a compact part required reading the entire part.

PR #85934 marked the experimental and beta JSON settings as obsolete. JSON was now unconditionally enabled.

PR #85718 removed the deprecated Object(’json’) implementation entirely. 270 files changed. ColumnObjectDeprecated, DataTypeObjectDeprecated, deprecated serialization files, the JSONDataParser, and all legacy tests were deleted. This was backward-incompatible: any tables or queries referencing Object(’json’) must be migrated before upgrading past v25.11.

The FUD: “JSONExtract on String columns is the primary approach”

PR #96711, by Fisnik Kastrati, extended all JSONExtract*, JSONHas, JSONLength, and JSONType functions to accept native JSON columns directly. More importantly, it introduced a FunctionToSubcolumnsPass planner optimization that rewrites constant-path JSONExtract calls into direct subcolumn reads.

This means existing queries that use JSONExtractString(json_col, ‘user’, ‘name’) now bypass text parsing entirely when the column is a native JSON type. The planner rewrites the call to a direct subcolumn read of json_col.user.name. This closed issue #88370.

PR #96927 added an optimized has(json_col, ‘path’) function for fast path-existence checks without text parsing. This is essential for queries that filter based on whether a JSON path exists.

PR #94105 introduced SubcolumnPushdownPass, which pushes subcolumn requirements through CTEs and views. This means wrapping a JSON table in a view or CTE no longer defeats subcolumn optimizations.

PR #98886 enabled bloom and text skip indexes on JSONAllPaths(), allowing efficient filtering on JSON key presence. This gives ClickHouse the ability to skip entire granules when querying for documents that contain (or don’t contain) specific paths.

PR #97871, by Amos Bird, refactored the tokenizer to a SIMD-ready stateful API, replacing the older iterator API. This lays the groundwork for continued parsing performance improvements.

PR #98788 introduced a unified combined subcolumn that returns Dynamic for both scalar and object access at a path. This simplifies queries that need to handle paths where the value might be a scalar or a nested object.

The JSON type’s performance has been validated by both ClickHouse’s internal benchmarks and independent third-party testing. The numbers come from specific, verifiable sources.

The JSONBench benchmark (https://jsonbench.com/) tested the native JSON type against other databases on 1 billion Bluesky social media documents on a single m6i.8xlarge node:

A follow-up benchmark in March 2025 scaled to 4 billion+ documents (1.6 TiB), achieving 91.84 million docs/sec throughput with sub-100ms queries.

SigNoz, a ClickHouse-based observability platform, reported 30% faster log queries with the native JSON type. ClickHouse’s own observability stack (ClickStack) demonstrated 9x faster queries compared to the previous Map-based approach for OpenTelemetry log attributes.

The observability domain is where the JSON type’s impact is most visible. Before native JSON, log management solutions built on ClickHouse flattened attributes into Map(String, String) columns, losing type information. Queries like SUM(LogAttributes.response_size) required explicit casts on every access.

With the native JSON type, OpenTelemetry log attributes preserve their native types. The performance difference:

Fairness matters. A few things still require awareness:

Path explosion requires attention. Without appropriate max_dynamic_paths settings and SKIP rules, high-cardinality JSON (thousands of unique paths per document) can create many subcolumns. Set limits that match your schema shape, and use SKIP REGEXP for noisy paths.
Subcolumn pruning through SELECT * in CTEs is not yet supported. Issue #92455 documents this gap. Explicitly name columns in CTEs over JSON tables for now.
Legacy Object(’json’) migration is mandatory. PR #85718 enforces a hard removal. Post-upgrade to v25.12+, any tables or queries referencing Object(’json’) will fail. Audit schemas and run ALTER before upgrading past v25.11.
Correctness fixes are ongoing. Edge cases in JSONExtract interop (issue #102018, #102079), default value handling (issue #101721), and specific format combinations (issue #101911) show that a system this complex requires staying on the latest stable release.
Advanced shared data mode trades write cost for read performance. The per-granule path indexes that enable 58x faster reads add write overhead. For write-heavy workloads with infrequent selective reads, the simpler map mode may be more appropriate.
Type hints and path configuration require understanding your data. The defaults work well for moderate schemas (up to 1,024 unique paths). Workloads with tens of thousands of unique paths need tuning.

These are real engineering trade-offs, and understanding them is part of making an informed decision.

Run the PR and benchmark numbers.

When someone tells you ClickHouse can’t handle JSON in 2026, ask them if they’ve tested against a version that includes the native JSON type (GA since v25.3, PR #77785), primary key support for JSON subcolumns (v24.12, PR #72644), advanced shared data serialization (v25.8, PR #83777), or JSONExtract interop with native JSON columns (v26.2, PR #96711).

If they’re referencing Object(’json’), that type was removed in v25.11. If they’re recommending JSONExtract on String columns as the primary approach, the native JSON type has made that unnecessary since v24.8. If they’re telling you to flatten JSON into columns manually, the type does this automatically with configurable limits and type hints.

The commit history doesn’t lie. 80+ pull requests. Three foundational types. Three generations of storage serialization. Primary key indexing. Planner-level subcolumn optimization. 2,500x faster than MongoDB on real-world data.

ClickHouse’s JSON implementation in 2026 bears no resemblance to the string-based functions and experimental Object type that earned those early warnings. The engineers built a ground-up columnar JSON storage system, and the evidence is in the PRs.

Test it on your workload. That’s the only benchmark that matters.

Yes. ClickHouse’s native JSON type (PR #66444) stores each JSON path as a separate Dynamic-typed subcolumn in columnar format. It reached GA in v25.3 (PR #77785) with all experimental flags removed. The legacy Object(’json’) type was fully removed in v25.11 (PR #85718).

Advanced shared data serialization (PR #83777), which delivers 58x faster reads and 3,300x less memory for selective path access. For insert workloads, adaptive write buffers (PR #69272) with 20x memory reduction are equally important.

On the JSONBench benchmark (1 billion Bluesky documents, single node), ClickHouse with the native JSON type is 2,500x faster than MongoDB for aggregations, 10x faster than Elasticsearch, and 9,000x faster than DuckDB/PostgreSQL for analytics. Storage is 20% more compact than compressed files and 40% more efficient than MongoDB.

Yes. ALTER TABLE ... MODIFY COLUMN col JSON converts String columns to native JSON during background merges with no downtime (PR #70442). After migration, queries read only the paths they need instead of scanning the full string. JSONExtract functions continue to work on native JSON columns (PR #96711) and get rewritten to direct subcolumn reads by the planner.

Object(’json’) (PR #23932) was ClickHouse’s first attempt at native JSON storage, shipped in 2022. It suffered from type unification issues, metadata explosion, and race conditions. Rather than patching it, ClickHouse built an entirely new implementation from first principles using Variant, Dynamic, and JSON types. Object(’json’) was fully removed in v25.11 (PR #85718). Tables using it must be migrated via ALTER TABLE ... MODIFY COLUMN ... JSON (PR #71784) before upgrading.

Yes, since v24.12. PR #72644 enables JSON subcolumns in ORDER BY and data-skipping index expressions. PR #98886 adds bloom and text skip indexes on JSONAllPaths() for efficient key-presence filtering.

The max_dynamic_paths parameter (default 1024) controls how many paths get their own columnar subcolumn. Paths beyond this limit overflow into shared data storage. The advanced serialization mode (PR #83777) makes shared data reads efficient with per-granule path indexes. Use SKIP and SKIP REGEXP to exclude noisy paths from columnar storage.

Yes. ClickHouse’s own observability stack (ClickStack) uses the native JSON type for OpenTelemetry log attributes, demonstrating 9x faster queries compared to the previous Map-based approach. SigNoz independently validated 30% faster log queries. The type preserves native numeric types, eliminating cast overhead for aggregations on log attributes.

Analysis based on 80+ GitHub pull requests, official ClickHouse changelogs, release blogs, and third-party benchmarks covering the period 2019-2026. Every claim maps to a specific merged PR. Verify the evidence yourself -- the commit history is public.

ClickHouse Native JSON Support in 2026: A PR-by-PR Analysis

Discussion about this post

Ready for more?