From XML to JSON to CBOR

cborbook.com

81 points by GarethX a day ago


mrbluecoat - a day ago

Feels like a CBOR ad to me. I agree that most techs are familiar with XML and JSON, but calling CBOR a "pivotal data format" is a stretch compared to Protobuf, Parquet, Avro, Cap'n Proto, and many others: https://en.m.wikipedia.org/wiki/Comparison_of_data-serializa...

dang - 15 hours ago

Related. Others?

Begrudgingly Choosing CBOR over MessagePack - https://news.ycombinator.com/item?id=43229259 - March 2025 (78 comments)

MessagePack vs. CBOR (RFC7049) - https://news.ycombinator.com/item?id=23838565 - July 2020 (2 comments)

CBOR – Concise Binary Object Representation - https://news.ycombinator.com/item?id=20603378 - Aug 2019 (71 comments)

CBOR – Concise Binary Object Representation - https://news.ycombinator.com/item?id=10995726 - Jan 2016 (36 comments)

Libcbor – CBOR implementation for C and others - https://news.ycombinator.com/item?id=9597198 - May 2015 (5 comments)

CBOR – A new object encoding format - https://news.ycombinator.com/item?id=6932089 - Dec 2013 (9 comments)

RFC 7049 - Concise Binary Object Representation (CBOR) - https://news.ycombinator.com/item?id=6632576 - Oct 2013 (52 comments)

_the_inflator - 19 hours ago

Love or hate JSON, the beauty and utility stem from the fact that you have only the fundamental datatypes as a requirement, and that's it.

Structured data that, by nesting, pleases the human eye, reduced to the max in a key-value fashion, pure minimalism.

And while you have to write type converters all the time for datetime, BLOBs etc., these converters are the real reasons why JSON is so useful: every OS or framework provides the heavy lifting for it.

So any elaborated new silver bullet would require solving the converter/mapper problem, which it can't.

And you can complain or explain with JSON: "Comments not a feature?! WTF!" - Add a field with the key "comment"

Some smart guys went the extra mile and nevertheless demanded more, because wouldn't it be nice to have some sort of "strict JSON"? JSON schema was born.

And here you can visibly experience the inner conflict of "on the one hand" vs "on the other hand". Applying schemas to JSON is a good cause and reasonable, but guess what happens to JSON? It looks like unreadable bloat, which means XML.

Extensibility is fine, basic operations appeal to both demands, simple and sophisticated, and don't impose the sophistication on you just for a simple 3-field exchange about dog food preferences.

brookst - a day ago

Odd that the XML and JSON sections show examples of the format, but CBOR doesn’t. I’m left with no idea what it looks like, other than “building on JSON’s key/value format”.

makapuf - a day ago

ASN.1 while complex has really seems to be a step up from those (even if older) in terms of terseness (as binary encoding) and generality.

nabla9 - a day ago

CBOR is when you need option for very small code size. If you can always use compression, CBOR provides no significant data size improvement over JSON.

With small code size it beats also BSON, EBML and others.

aidenn0 - 16 hours ago

I admit I got nerd-sniped here, but the table for floats[1] suggests that 10000.0 be represented as a float32. However, isn't it exactly representable as 0x70e2 in float16[2]? There are only 10 significant bits to the mantissa (including the implicit 1), while float16 has 11 so there's even an extra bit to spare.

1: https://cborbook.com/part_1/practical_introduction_to_cbor.h...

2: i.e. 1.220703125×2¹³

JoelJacobson - 14 hours ago

Fun fact: CBOR is used within the WebAuthn (Passkey) protocol.

To do Passkey-verification server-side, I had to implement a pure-SQL/PLpgSQL CBOR parser, out of fear that a C-implementation could crash the PostgreSQL server: https://github.com/truthly/pg-cbor

fjfaase - a day ago

This is a link to just one section of a larger book. The next section compare CBOR with a number of other binary storage format, such as protobuf.

gethly - a day ago

I wish browsers would support CBOR natively so I could just return CBOR instead of JSON(++speed --size ==win) and not have to be concerned with decoding it or not being able to debug requests in dev console.

glenjamin - 20 hours ago

The only mention I can see in this document of compression is

> Significantly smaller than JSON without complex compression

Although compression of JSON could be considered complex, it's also extremely simple in that it's widely used and usually performed in a distinct step - often transparently to a user. Gzip, and increasingly zstd are widely used.

I'd be interested to see a comparison between compressed JSON and CBOR, I'm quite surprised that this hasn't been included.

JimDabell - 20 hours ago

Previously:

CBOR – Concise Binary Object Representation - https://news.ycombinator.com/item?id=20603378 - Aug 2019 (71 comments)

Begrudgingly Choosing CBOR over MessagePack - https://news.ycombinator.com/item?id=43229259 - Mar 2025 (78 comments)

johnisgood - 19 hours ago

Erlang / Elixir has amazing support for ASN.1! I love it.

https://www.erlang.org/doc/apps/asn1/asn1_getting_started.ht...

https://www2.erlang.org/documentation/doc-14/lib/asn1-5.1/do... (https://www2.erlang.org/documentation/doc-14/lib/asn1-5.1/do...)

I am using ASN.1 to communicate between a client (Java / Kotlin) and server (Erlang / Elixir), but unfortunately Java / Kotlin has somewhat of a shitty support for ASN.1 in comparison to Erlang.

camgunz - 19 hours ago

Oh good, another CBOR thread. Disclaimer: I wrote and maintain a MessagePack implementation. I've also bird dogged this for a while, HN search me.

Mostly, I just want to offer a gentle critique of this book's comparison with MessagePack [0].

> Encoding Details: CBOR supports indefinite-length arrays and maps (beneficial for streaming when total size is unknown), while MessagePack typically requires fixed collection counts.

This refers to CBOR's indefinite length types, but awkwardly, streaming is a protocol level feature, not a data format level feature. As a result, there's many better options, ranging from "use HTTP" to "simply send more than 1 message". Crucially, CBOR provides no facility for re-syncing a stream in the event of an error, whether that's network or simply a bad encoding. "More features" is not necessarily better.

> Standardization: CBOR is a formal IETF standard (RFC 8949) developed through consensus, whereas MessagePack uses a community-maintained specification. Many view CBOR as a more rigorous standard inspired by MessagePack.

Well, CBOR is MessagePack. Carsten Bormann forked MessagePack, changed some of the tag values, wrote a standard around it, and submitted it to the IETF against the wishes of MessagePack's creators.

> Extensibility: CBOR employs a standardized semantic tag system with an IANA registry for extended types (dates, URIs, bignums). MessagePack uses a simpler but less structured ext type where applications define tag meanings.

Warning: I have a big rant about the tag registry.

The facilities are the same (well, the tag is 8 bytes instead of 1 byte, but w/e); it's TLV all the way down (Bormann ripped this also). Bormann's contribution is the registry, which is bonkers [1]. There's... dozens of extensions there? Hundreds? No CBOR implementation supports anywhere near all this stuff. "Universal Geographical Area Description (GAD) description of velocity"? "ur:request, Transaction Request identifier"?

The registry isn't useful. Here are the possible scenarios:

If something is in high demand and has good support across platforms, then it's a no-brainer to reserve a tag. MP does this with timestamps.

If something is in high demand, but doesn't have good support across platforms, then you're putting extra burden on those platforms. Ex: it's not great if my tiny microcontroller now has to support bignums or 128-bit UUIDs. Maybe you do that, or you make them optional, but that leads us to...

If something isn't in high demand or can't easily be supported across platforms, but you want support for it anyway, there's no need to tell anyone else you're using that thing. You can just use it. That's MP's ext types.

CBOR seems to imagine that there's a hypothetical general purpose decoder out there that you can point to any CBOR API, but there isn't and there never will be. Nothing will support both "Used to mark pointers in PSA Crypto API IPC implementation" and "PlatformV_HAS_PROPERTY" (I just cannot get over this stuff). There is no world where you tell the IETF about your tags, define an API with them, and someone completely independently builds a decoder for them. It will always be a person who cares about your specific tags, in which case, why not just agree on the ext types ahead of time? A COSE decoder doesn't need also need to decode a "RAINS Message".

> Performance and Size: Comparisons vary by implementation and data. CBOR prioritizes small codec size (for constrained devices) alongside message compactness, while MessagePack focuses primarily on message size and speed.

I can't say I fully understand what this means, but CBOR and MP are equivalent here, because CBOR is MP.

> Conceptual Simplicity: MessagePack's shorter specification appears simpler, but CBOR's unification of types under its major type/additional info system and tag mechanism offers conceptual clarity.

Even if there's some subjectivity around "conceptual simplicity/clarity", again CBOR and MP are equivalent here because they're functionally the same format.

---

I have some notes about the blurb above too:

> MessagePack delivers greater efficiency than JSON

I think it's probably true that the fastest JSON encoders/decoders are faster than the fastest MP encoders/decoders. Not that JSON performance has a higher ceiling, but it's got gazillions of engineering hours poured into it, and rightly so. JSON is also usually compressed, so space benefits only matter at the perimeters. I'm not saying there's no case for MP/CBOR/etc., just that the efficienty/etc. gap is a lot smaller than one would predict.

> However, MessagePack sacrifices human-readability

This, of course, applies to CBOR as well.

> ext mechanism provides less structure than CBOR's IANA-registered tags

Again the mechanism is the same, only the registry is different.

[0]: https://cborbook.com/introduction/cbor_vs_the_other_guys.htm...

[1]: https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml

naggumsghost - 10 hours ago

If GML was an infant, SGML is the bright youngster far exceeds expectations and made its parents too proud, but XML is the drug-addicted gang member who had committed his first murder before he had sex, which was rape.

https://www.schnada.de/grapt/eriknaggum-xmlrant.html

We're going to have to think up something worse for CBOR.

darthrupert - a day ago

CBOR has always seemed to me like the most promising data format for efficient data transfer. Somewhat weird how little use it has.

naikrovek - a day ago

people are just straight up afraid to write their own binary formats, aren't they.

it's not hard, it's exactly like creating your own text format but you write binary data instead of text, and you can't read it with your eyes right away (but you can after you've looked at enough of it.) there is nothing to fear or to even worry about; just try it. look up how things like TLV work on wikipedia. you can do just about anything you would ever need with plain binary TLV and it's gonna perform like you wouldn't believe.

https://en.wikipedia.org/wiki/Type%E2%80%93length%E2%80%93va...

binary formats are always going to be 1-2 orders of magnitude faster than plain text formats, no matter which plain text format you're using. writing a viewer so you can easily read the data isn't zero-effort like it is for JSON or XML where any existing text editor will do, but it's not exactly hard, either. your binary format reading code is the core of what that viewer would be.

once you write and use your own binary format, existing binary formats you come across become a lot less opaque, and it starts to feel like you're developing a mild superpower.

kookamamie - 21 hours ago

The article reads like a semi-slop with its numerous lists and overly long explanations of obvious things, such as how XML came to be.

Jean-Papoulos - 21 hours ago

Obligatory https://xkcd.com/927/

deafpolygon - a day ago

[flagged]

lihaciudaniel - a day ago

[flagged]

zbendefy - a day ago

How different is CBOR compared to BSON? Both seem to be binary json-like representations.

Edit: BSON seems to contain more data types than JSON, and as such it is more complex, whereas CBOR doesn't add to JSON's existing structure.