Settings

Theme

JSON Feed

jsonfeed.org

486 points by fold 9 years ago · 224 comments

Reader

mstade 9 years ago

> JSON Feed files must be served using the same MIME type — application/json — that’s used whenever JSON is served.

So then it's JSON, and I'll treat it as any other JSON: a document that is either an object or an array, that can include other objects or arrays, as well as numbers and strings. Property names doesn't matter, nor do order of properties or array items, or whatever values are contained therein.

Please don't try to overload media types like this. Atom isn't served as `application/xml` precisely because it isn't XML; it's served as `application/atom+xml`. For a media type that is JSON-like but isn't JSON, you may wish to look at `application/hal+json`; incidentally there's also `application/hal+xml` for the XML variant.

Or as someone else rightly suggested, consider just using JSON-LD.

  • anshou- 9 years ago

    It's worth pointing out that any valid JSON value is a valid JSON document. There is no requirement or guarantee that an array or an object are the top-level value in a JSON document.

    "I am a valid JSON document. So is the Number below, and in fact every line below this line."

    4

    null

    • jameshart 9 years ago

      Actually actually... the JSON spec doesn't define the concept of a JSON document. Neither http://www.json.org/ nor http://www.ecma-international.org/publications/files/ECMA-ST... actually specifies that a JSON 'document' is synonymous with a JSON 'value'.

      Now it's also true that JSON doesn't specify an entity that can be either an object or an array but not be a string or a bool or a number or null. So it's kind of true that JSON doesn't say that an object or array are valid root elements.

      But JSON also says "JSON is built on two structures" - arrays and objects. It defines those two structures in terms of 'JSON values'. But it's a reasonable way to read the JSON spec to say that it defines a concept of a 'JSON structure' as an array or object - but not a plain value. And then to assume that a .json file contains a JSON 'structure'.

      Basically... JSON's just not as well defined a standard as you might hope.

      edit: And now I'm going to well actually myself: Turns out https://tools.ietf.org/html/rfc4627 defines a thing called a 'JSON text' which is an array or an object, and says that a 'JSON text' is what a JSON parser should be expected to parse.

      So - pick a standard.

      • niftich 9 years ago

        JSON is in fact defined in (at least) six different places, as described in the piece 'Parsing JSON is a Minefield' [1] (HN: [2]).

        The problem is perhaps not as egregious as with "CSV" -- which is more of a "technique" rather than a format, despite after 30 years of customary usage, someone retroactively having written a spec; but it does manifest in various edge cases like we're debating.

        [1] http://seriot.ch/parsing_json.php [2] https://news.ycombinator.com/item?id=12796556

      • d0mine 9 years ago

        Why are you referencing the obsolete rfc? There is no restriction to object/array for the JSON text in the current rfc https://tools.ietf.org/html/rfc7159

        • dwaite 9 years ago

          The current RFC recommends the use of an object or array for interoperability with the previous specification. JSON being a bit of a clusterf* of variants, they tried to make the RFC broad then place interoperability limitations on it. (lenient in what you accept, etc etc)

        • jameshart 9 years ago

          Because I just discovered that there was, at least once, a specification that actually defined JSON that way, where previously I had thought it had only been ambiguously described, and I thought that was interesting.

    • paulddraper 9 years ago

      > There is no requirement or guarantee that an array or an object are the top-level value in a JSON document.

      Alas, if only that were true.

      RFC 4627:

      > A JSON text is a serialized object or array. The MIME media type for JSON text is application/json.

      RFC 7159:

      > A JSON text is a serialized value. Note that certain previous specifications of JSON constrained a JSON text to be an object or an array. Implementations that generate only objects or arrays where a JSON text is called for will be interoperable in the sense that all implementations will accept these as conforming JSON texts.

      IIRC, Ruby's JSON parser was written to be strictly RFC 4627 compliant, and yields a parser error for non-array non-object texts.

      Since JSON isn't versioned so no one has any idea what "JSON" really means, or what "standard" is being followed.

    • mstade 9 years ago

      You're right, thanks for the correction! Also kind of reinforces my point I feel. That any JSON document is just that, a JSON document; it doesn't carry more semantics just because you say so. My JSON parser will still just see simple JSON values, no matter how much I tell it that a certain key should really be a URL, not just a string.

      • tghw 9 years ago

        True, but that's also true of any XML, RSS, Atom, HTML, etc. Websites abuse HTML all the time, and there's nothing saying that just because something is transferred with application/atom+xml that it will be valid or follow the spec.

        It's more of a social agreement. If you get a JSON object from a place you expect a JSON Feed and it has a title and items, then it'll probably work, even if it omits other things.

        • mstade 9 years ago

          So we can ditch media types altogether then? What's the point of having actual contracts if all we need is a hand shake and a wink? We're not talking about malformed data here, that's something different entirely and yes – it happens all the time. We're talking about calling a spade a spade.

          If it's JSON your program expects then I should be able to throw any valid JSON at your program and it should work. Granted, it probably won't be a very interesting program precisely because JSON is just generic data without any meaningful semantics.

          This spec is entirely about attaching semantics to JSON documents, but all that gets lost when you forget to let people know the document carries semantics and just call it generic JSON. Maybe that doesn't matter to a JSON-feed specific app that thinks any JSON is JSON-feed (an equally egregious error) but if there's an expectation that I should be able to point my catch-all program (i.e. web browser) at a URL and it should magically (more like heuristically I guess, potato/tomato) determine that the document retrieved isn't in fact just any JSON then things are about to get real muddy. Web browsers aren't particularly social, so I suspect a social agreement probably won't work that well.

          Media types aren't just something that someone thought was a nifty idea back in the dizzy, they are pretty important to how the web functions.

          • DougWebb 9 years ago

            If it's JSON your program expects then I should be able to throw any valid JSON at your program and it should work.

            That's not a valid argument, because JSON is just a serialization format for an arbitrary data structure. You can't throw any arbitrary data structure at any program that accepts data and expect it to be able to accept it. Every program that accepts input requires that input to be in a specific format, which is nearly always more specific than the general syntax of the format. And aside from programs that make strict use of XML schemas, they pretty much all use the handshake-and-wink method for enforcing the contract. (Or to put it another way: documentation and input validation.)

            My take on the author's approach is that the content-type is specifying the syntax of the expected input, and the documentation specifies the semantics and details of the data structure. In that respect, the program works like most other programs out there.

            • mstade 9 years ago

              Aww that's not fair – if you're going to quote then don't cherry pick and remove the relevant bits.

              > If it's JSON your program expects then I should be able to throw any valid JSON at your program and it should work. Granted, it probably won't be a very interesting program precisely because JSON is just generic data without any meaningful semantics.

              (Emphasis mine.)

              By doing this you're just reinforcing my argument that just parsing any ol' plain JSON won't make for very interesting programs. JSON is just plain dumb data, it doesn't tell you anything interesting. There may be additional semantics you can glean from a document than just its format (HTML is pretty good for this, but oddly enough not a very popular API format) if there are mechanisms to describe data in richer terms – but JSON has none of these. Yet this spec says you should serve this as just plain ol' boring JSON.

              > And aside from programs that make strict use of XML schemas, they pretty much all use the handshake-and-wink method for enforcing the contract.

              This is just not true. Case in point: web browsers – arguably one of the most successful kind of program there ever was, with daily active users measuring in the billions – make heavy use of meta data including media types to determine how to interpret the input. Not just by way of format (i.e. media type) but also by way of supplemental semantics (e.g. markup, micro formats, links.)

              > My take on the author's approach is that the content-type is specifying the syntax of the expected input, and the documentation specifies the semantics and details of the data structure.

              Which could and should be described in a spec, with a corresponding IANA consideration to include a new and specific media type in the appropriate registry – not by overloading an existing one.

          • SamBam 9 years ago

            I'm not sure what you're arguing. JSONFeed is JSON, unless I'm missing something, just JSON that matches a specific schema.

            If I'm pulling JSON from any API, I expect it to match a certain schema. If I expected { "time": 10121} from a web API they send me "4", then sure, that's valid JSON, but it doesn't match the schema they promised me in the API.

            Something that's JSON should be marked JSON, even if we're expecting it to follow a schema.

            • nothrabannosir 9 years ago

              > JSONFeed is JSON, unless I'm missing something, just JSON that matches a specific schema.

              Yes, and everything is application/octet-stream, so why have mime types? Because it helps with tooling, discovery, and content negotiation. It is a hint for the poor soul who inherits some undocumented ruby soup calling your endpoint.

              Being as specific as possible with mine types is a convention for a reason. Please don't break it unless you have an explicit reason to.

            • gtramont 9 years ago

              This is exactly one of the things that media-types solve. Simply using application/json doesn't tell me (consumer) anything about the semantics of what I'm reading. It only tells me what "parser" to use. If we have a proper media-type, like application/hal+json, I know exactly how to create a client for that type: I need to use a JSON parser _and_ use the vocabulary defined by HAL…

            • cyphar 9 years ago

              > Something that's JSON should be marked JSON, even if we're expecting it to follow a schema.

              That's what the +json type suffix is for. I wonder how many people in this thread actually have read the mediatype RFCs, because they definitely don't encourage using mediatypes in the way you're describing.

              The whole point of mediatypes is to make it possible to distinguish schemas while also potentially describing the format that the schema is in.

            • jamesmalvi 9 years ago

              this tool may help to validate and format JSON data, https://jsonformatter.org

    • debaserab2 9 years ago

      Beware that many JSON parsers don't agree with this, although your interpretation is the correct interpretation of the spec. Some parsers will only accept either an array or object. If you're building a JSON endpoint you'll be safest returning either an array or object.

    • niftich 9 years ago

      true

    • hyperpallium 9 years ago

      false

  • eriknstr 9 years ago

    Absolutely agree about the MIME type.

    Someone filed an issue and created a pull-request for this after you wrote this comment.

    https://github.com/brentsimmons/JSONFeed/issues/22

    https://github.com/brentsimmons/JSONFeed/pull/23

    I hope they will merge it.

  • rspeer 9 years ago

    Should this web page have been served as `text/hacker-news-comment-thread+html`?

    • niftich 9 years ago

      No. HTML is not formally recognized to be a 'Structured Syntax' of upon which semantically richer standalone mediatypes can be built [1]. This is because existing deployments favor a different approach of imbuing additional semantics inside HTML documents -- microformats -- which place the mechanism of understanding on an opportunistic parser, vs. a restrained one that only executes if its preferred mediatype is advertised. Appendix A of RFC 3023 [2] offers a thorough treatment of this matter. Not defining +html is essentially a concession that enables the two schools of thought to coexist side-by-side.

      This is the same difference in schools that I express in a different comment [3] in this thread.

      [1] https://tools.ietf.org/html/rfc6839 [2] https://tools.ietf.org/html/rfc3023#appendix-A [3] https://news.ycombinator.com/item?id=14361842

    • eridius 9 years ago

      No, because this web page isn't using a specialized format.

      But if it were using XHTML, then the proper mime type would be application/xhtml+xml.

  • foota 9 years ago

    It seems that all this spec is, is a structure for an api response. I don't see why it should have a different media type.

    • martin-adams 9 years ago

      I don't believe it should be just application/json because it's a specific format of json. There could be multiple json representations of the feed other than jsonfeed that the server supports and the client could define which ones they Accept.

      So the server could support all of the following:

      application/jsonfeed

      application/rss+xml

      application/atom+xml

      Who knows, maybe RSS and ATOM could be represented in JSON and have the following mime types:

      application/rss+json

      application/atom+json

      If it's just an API response, and it is your API for an application called Widget Factory, then you can, if you want, have your own format:

      application/vnd.widgetfactory+json

      Generally, defining such a mime type should have some specification describing it otherwise no client can reliably implement a compatible client. Jsonfeed have proposed that specification.

  • tootie 9 years ago

    Well, if JSON had namespaces or standard validation framework, we could have that conversation.

    • vog 9 years ago

      You mean, like JSON schema?

      http://json-schema.org/

      Not sure why you want to emulate XML namespaces in JSON, but JSON schemas can include other JSON schemas and extend upon other JSON schemas. That accounts for 99.9% of the use cases for namespaces.

    • mstade 9 years ago

      That's my point though – it doesn't have anything to describe metadata, so therefor trying to cram in additional semantics is futile if you still want to call it JSON. Call it something else and you can attach whatever semantics you'd like, but they think it should be served up as `application/json` which means all those semantics go out the window.

mindcrime 9 years ago

Do we really need this? Atom is fine for feeds. Avoiding XML just for the sake of avoiding XML, because it isn't "cool" anymore is just dump groupthink.

If this industry has a problem, it's FDD - Fad Driven Development and IIICIS (If It Isn't Cool, It Sucks) thinking.

  • pfranz 9 years ago

    Part of me is with you. But even in established languages I've had trouble finding an appropriate xml parser and had to tweak them way more than I thought necessary. I haven't (yet) had that problem with JSON.

    I think with something like feeds there's the possible benefit of becoming a 'hello world' for frameworks. Many frameworks have you write a simple blogging engine or twitter copycat. I don't think I've ever seen that for a feed reader/publisher. People have said that Twitter clients were an interesting playground for new UI concepts and paradigms because the basics were so simple (back when their API keys were less restrictive). Maybe this could be that?

    • mindcrime 9 years ago

      But even in established languages I've had trouble finding an appropriate xml parser and had to tweak them way more than I thought necessary. I haven't (yet) had that problem with JSON.

      Maybe it's just that I work mostly with JVM languages (Java, Groovy, etc.) but I haven't had any problems with handling XML - including Atom - in years. But I admit that other platforms might not have the same degree of support.

      • pfranz 9 years ago

        Most of my experience is from Python. Each time I use it I have to look at the docs for etree (a library that ships with Python). We would hit performance and feature support issues with etree and tried lxml but had binary compatibility issues between our environments.

        The Hitchhiker's Guide to Python[1] (a popular reference for Python) recommends untangle[2] and xmltodict[3], neither of which I've used.

        I feel like in other languages I've used had similar brittleness when dealing with xml. I might be biased because working with xml in an editor it's difficult to validate visually or grok in general when used in practice.

        [1] http://python-guide-pt-br.readthedocs.io/en/latest/scenarios...

        [2] https://untangle.readthedocs.io/en/latest/

        [3] https://github.com/martinblech/xmltodict

        • moolcool 9 years ago

          Beautiful Soup is alright in most cases. JSON is handled much better than any XML library I've seen so far though.

          • pfranz 9 years ago

            Oh yes, I've used Beautiful Soup, too. If I remember correctly I had great luck with html, but issues with xml. It also is only a reader, not a writer.

      • abritinthebay 9 years ago

        > Maybe it's just that I work mostly with JVM languages (Java, Groovy, etc.) but I haven't had any problems with handling XML

        Yeah, no surprise. XML may as well be a native data-type in most core JVM languages.

        It's not the case everywhere else however.

    • dabeeeenster 9 years ago

      What language are you using that doesn't have a working XML parser? REALLY?

      • mattmanser 9 years ago

        He said appropriate XML parser.

        All languages have XML parsers, it's more that a lot suck, they might have weird concepts you have to use, or are constantly tripping you up with namespaces, or make it really hard to write xpath queries.

        • josteink 9 years ago

          > or are constantly tripping you up with namespaces

          You mean requires that you understand the XML format you are working with? Oh noes!

          Namespaces exist, just about everywhere in the world of programming, and they do so for a reason.

          <bar /> is not the same as <foo:bar /> just like http://bar.com is not the same as http://bar.foo.com.

          If that's putting the bar high, I really think I may be suffering a huge disconnect from the rest of my peers in terms of expected capabilities.

          Just because JSON doesn't have namespacing-capabilities at all, doesn't make it a worthless feature. It's actually what gives you the eXtensibility in XML. As a developer I expect you to understand that.

          (And I wonder how long time it will take before the JS-world re-implements this XML-wheel, while again doing so with a worse implementation)

          • acdha 9 years ago

            The reason why many developers hate XML namespaces isn't the concept but the implementations which force you to repeat yourself everywhere. I think a significant amount of the grumbling would go away if XPath parsers were smart enough to assume that //tag was the same as //default-and-only-namespace:tag, or at least allowed you to use //name:tag instead of //{URI}tag because then you could write against the document as it exists rather than mentally having to translate names everywhere.

            Yes, you can write code to add default namespaces when the document author didn't include them and pass in namespace maps everywhere but that's a lot of tedious boilerplate which requires regular updating as URLs change. Over time, people sour on that.

            It really makes me wonder what it'd be like now if anyone had made an effort to invest in making the common XML tools more usable and other maintenance so e.g. you could actually rely on using XPath 2+.

          • jancsika 9 years ago

            > (And I wonder how long time it will take before the JS-world re-implements this XML-wheel, while again doing so with a worse implementation)

            I'm going to guess never. I'm also going to guess that there isn't a single flamewar in the entire history of JSON where someone was trying to figure out how to implement anything close to XML namespaces in JSON. And by "close", I mean something that would require changes to JSON parsers and/or downstream APIs to accommodate potentially bipartite keys.

            • josteink 9 years ago

              You never know. This is what they said about schemas too not many years back.

              • jancsika 9 years ago

                Have there been any discussions whatsoever about adding some sort of namespacing mechanism to JSON?

                • rspeer 9 years ago

                  Well, there's JSON-LD (JSON Linked Data) already.

                  It's for making interoperable APIs, so there is a good motivation for namespaces. But the namespaces are much less intrusive than XML namespaces. Ordinary API consumers don't even have to see them.

                  One of the key design goals of JSON-LD was that -- unlike its dismal ancestor, RDF/XML -- it should produce APIs that people actually want to use.

                  • jancsika 9 years ago

                    Thanks, I haven't explored JSON-LD before.

                    But that's not a case of adding namespaces to JSON, is it?

                    What I mean is if one were to take the skeptical position that JSON is going to end up "re-inventing the XML wheel", that would mean JSON advocates would need to push namespaces into the JSON spec as a core feature of the format. I've never read a discussion of such an idea, but I'd like to if they exist.

                    edit: clarification

                    • rspeer 9 years ago

                      Well, yeah, perhaps the craziest thing about XML is that it has namespaces built into its syntax with no realistic model of how or why you would be mashing up different sources of XML tags.

                      Namespaces are about the semantics of what strings refer to. They belong in a layer with semantics, like JSON-LD, not in the definition of the data transfer format.

                      I am convinced that nobody would try to add namespaces to JSON itself. Just about everyone can tell how bad an idea that would be.

                      • jancsika 9 years ago

                        > Well, yeah, perhaps the craziest thing about XML is that it has namespaces built into its syntax with no realistic model of how or why you would be mashing up different sources of XML tags.

                        The thing that gets me is that they were added to XML, so the downstream APIs then got mirrored interfaces like createElementNS and setAttributeNS that cause all sorts of subtle problems. With SVG, for example, this generates at least two possible (and common) silent errors-- 1) the author creates the SVG in the wrong namespace, and/or 2) more likely, the author mistakenly sets the attribute in the SVG namespace when it should be created in the default namespace. These errors are made worse by the fact that there is no way to fetch that long SVG namespace string from the DOM window (aside from injecting HTML and querying the result)-- judging from Stackexchange users are manually typing it (often with typos) into their program and generating errors that way, too.

                        Worse, as someone on this site pointed out, multiple inline SVGs can still have attributes that easily suffer from namespace clashes in the <defs> section. It's almost comical-- the underlying format has a way to prevent nameclashes with multiple attributes inside a single tag that share the same name-- setAttributeNS-- but is no help at all in this area.

                        edit: typo and clarification

  • djur 9 years ago

    XML parsers have a pretty bad track record for security vulnerabilities. If I was writing code to distribute that was going to be parsing arbitrary data from third parties (which is the RSS/Atom use case), I would be more comfortable trusting the average JSON parser than the average XML parser.

    Otherwise, I agree with the "if it ain't broke" principle. There's also cases where so much ad hoc complexity is built on top of JSON that you end up with the same problems XML has, except with less battle-tested implementations.

    • tetrep 9 years ago

      As terrible as XML parsers can be, they've never been as bad as "XMLdoc = eval(XMLString)". I'd be more likely to trust a JSON parser not written in JavaScript than an arbitrary XML parser, but that's only because of the XML specification itself, which includes such features as including arbitrary content as specified by URLs (including local (to the parser) files!). Great ideas when you can trust your XML document, not so great otherwise.

  • armandososa 9 years ago

    It is very likely than I am an idiot, but I've always found parsing XML too hard, specially compared to JSON which is almost too easy.

    • martijndwars 9 years ago

      Whether parsing XML is easy or hard, how often do you actually write an XML parser? If I'm digesting a JSON/XML document, I resort to a parser library for the language that I'm using at that point, so the complexity of writing such a parser is pretty much non-existent. Definitely not a compelling reason to switch to JSON.

      • morgante 9 years ago

        Most XML parsers I've used are leaky abstractions. Even once the document is parsed, actually accessing the data can require a lot more complexity than accessing parsed JSON data.

        • weberc2 9 years ago

          IIRC, the popular C++ implementations were glorified tokenizers. It was up to you to figure out which tokens were data and how those tokens related to each other.

          • icebraining 9 years ago

            Ah, SAX. People built some true horrors with that API, just because it was "more performant" than DOM. Never mind that their hacked-together tree builders often leaked like sieves.

    • mstade 9 years ago

      If there was an `XML.parse` just like there's `JSON.parse`, I doubt you'd say the same. As it stands, the added complexity in JS-land is to import a library that provides this functionality for you. Fortunately there are many, but I agree a built-in would be nice. It's a bit of a shame that E4X never landed in JS.

      • moduspol 9 years ago

        It's more than JUST library support. It's also that JSON deserializes into common native data types naturally (dictionary, list, string, number, null).

        You can deserialize XML into the same data types, but it's not anywhere near as clean because of how extensible XML is. That's a big part of what's made JSON successful.

        • mstade 9 years ago

          Right, but you inevitably end up with boilerplate "massage" code around your data anyway. Case in point: dates, any number that isn't just a number e.g. currencies or big numbers, URLs, file paths, hex, hashes. Basically any type that carries any kind of semantics beyond array, object, string, number, or null will require this boilerplate, only that your data format has no way of describing them except for out-of-band specs, if you want to call them such.

          At least XML has schemas, and even if all you're doing is deserializing everything into JsonML like objects you're still better off because you'll have in-band metadata to point you in the right direction.

          • spc476 9 years ago

            CBOR [1] allows the semantic tagging of data values and makes a distinction between binary blobs (a collection of 8-bit values) and text (which is defined as UTF-8).

            [1] RFC-7049. Also checkout http://cbor.io/

          • sacheendra 9 years ago

            IMHO the boilerplate code is much easier to read than understanding the nuances of XML if I have to read a document.

            {"type":"currency", "unit":"euro", "amount": 10}

            feels easier to understand than

            <currency unit="euro">10</currency>

            • mindcrime 9 years ago

              Maybe it's just conditioning, but I find the latter example easier to read and understand. In fact, I'd say that - in general - I find XML better in terms of human readability than JSON. I guess it just goes to show that we all see certain things differently. shrug

              • abritinthebay 9 years ago

                I think that's totally reasonable - because it was after all one of the goals of XML. That is, to be human readable.

                There is a difference however between readable + parsable vs parsable + easily dealt with.

                XML was not the latter. You have to do more work to traverse and handle XML inside your application than you do JSON, and most of the (reasonable) reasons for this are due to features that most cases don't need.

                JSON makes the common case easy, XML doesn't.

            • DougWebb 9 years ago

              How about:

              <rec type="currency" unit="euro" amount="10" />

              I don't think your problem is with the syntax, necessarily. It seems more like you prefer name/value pairs over semantic markup.

            • moolcool 9 years ago

              The biggest problem with XML is how easy it is to make a very bad schema, and how hard those can be to parse

          • mstade 9 years ago

            Also, for what it's worth, your point is exactly why I mentioned E4X. Sure wasn't a panacea, but it had some things going for it.

        • dangerlibrary 9 years ago

          This is really only true in dynamically typed languages. From personal experience: parsing json in Java or Go without just treating everything as a bag of Object or an interface{} requires a ton of menial boilerplate work.

          Super nice in python/ruby/javascript, though.

          • kitsunesoba 9 years ago

            Swift 4 will have JSON encoding/decoding built in, and I wouldn't be surprised to see such a feature spring up in other modern languages too. Once that boilerplate is eliminated, json is a pretty decent solution.

            https://www.hackingwithswift.com/swift4

          • tveita 9 years ago

            > parsing json in Java or Go without just treating everything as a bag of Object or an interface{} requires a ton of menial boilerplate work.

            From my experience in Java it is pretty simple using a library like Jackson. You define the types you expect to read, maybe add some annotations, and then it's one function call to deserialize. IIRC Go has something similar in its json library.

            • cdelsolar 9 years ago

              Yes, it's arguably nicer in Go, because you specify exactly what types and field names you expect, and then it's just a simple json.Unmarshal

          • moduspol 9 years ago

            Sure--it's kind of a pain in Swift, too.

            Wouldn't it just be worse with XML, though? I get that people don't realistically parse it themselves and libraries are smart enough to use schemas to deserialize, but there's nothing inherent about JSON that makes it unable to conform to a schema or be parsable by libraries into native strongly-typed objects the same way.

            • mstade 9 years ago

              Except JSON doesn't have semantics to describe schemas, only arrays, objects, strings, numbers and null. You can say "but this key is special" but then it's not JSON anymore. And if you're ok with that, may as well just use JSON-LD or some other JSON-like format.

          • zokier 9 years ago

            Idk, I think JSON parsing is pretty ergonomic in Rust, definitely nicer than your typical XML DOM.

        • zeveb 9 years ago

          Of course, JSON doesn't support complex numbers, bignums, rationals, cryptographic keys &c. And it'd be even worse than XML to try to represent programs in.

          JSON is definitely easier to decompose into a simple-to-manipulate bag-of-values than is XML.

      • zokier 9 years ago

        XML is fundamentally much more complex than JSON so any XML parsing library will inevitably present more complicated API. I kinda like XML (!), but there is no point pretending that using it is as simple as JSON.

        • stephenr 9 years ago

          I think that depends what you mean by "using it".

          XML can convey a lot more semantic meaning than JSON ever will, and standardisation of things like XPath, DOM, XSLT, etc provides a lot of power when working with XML documents.

          With JSON, essentially everything is unknown. You can't just get all child nodes of an object, or get all objects of a certain type, using standard methods. You need to know what object key 'child' nodes are referenced by, or loop through them all and hope that what you find is actually an array of child nodes, and not e.g. an array of property values. Finding all objects of a given type means knowing how the type is defined, AND the aforementioned "how do i get child nodes" to allow you to traverse the document.

          Of course that assumes what you have is a document, and not just a string encoded as JSON. Or a bool/null.

          My point is, the tooling around XML is very mature. "Use" of a data format is a very broad topic, and covers a lot more than just "i want to get this single property value".

        • nonodename 9 years ago

          Absolutely. Right tool for the right job. Mixed content (perhaps a paragraph with bold and italics) is absolutely horrible in JSON because it lacks the complexity that XML has to cope with this.

  • chc 9 years ago

    You're basically saying that this isn't technically better, just more socially acceptable right now. I think you're right, but it seems to me that Atom's problem is primarily a social one. So even if this doesn't carry any technical advantages, a format with a strong social "in" is precisely what we need to make feeds a thing again.

  • oxguy3 9 years ago

    To be honest, I'm really excited about the prospect of JSON based feeds. Right now, there's no easy way to work with Atom/RSS feeds on the command-line (that I know of anyway), which is something I often wish I could do. With a JSON feed, I can just throw the data at jq (https://stedolan.github.io/jq/) and have a bash script hacked together in 10 minutes to do whatever I want with the feed.

    • falcolas 9 years ago

      I give you libxml:

          xmllint --xpath '//element/@attribute'
      
      There's a good chance it's already installed on your mac.
    • chriswarbo 9 years ago

      There are a few nice XML processing utilities. I tend to use xmlstarlet and/or xidel. This lets me use XPath, jQuery-style selectors, etc.

      I agree that jq is really nice though. In particular, I still find JSON nicer than XML in the small-scale (e.g. scripts for transforming ATOM feeds) because:

      - No DTDs means no unexpected network access or I/O failures during parsing

      - No namespaces means names are WYSIWYG (no implicit prefixes which may/may not be needed, depending on the document)

      - All text is in strings, rather than 'in between' elements

      - No redundant element/attribute distinction

      Even with tooling, these annoyances with XML leak through. As an example, xmlstarlet can find the authors in an ATOM file using an XPath query like '//author'; except if the document contains a default namespace, in which case it'll return no results since that XPath isn't namespaced.

      This sort of silently-failing, document-dependent behaviour is really frustrating; requiring two branches (one for documents with a default-namespace, one for documents without) and text-based bash hackery to look for and dig out any default namespace prior to calling xmlstarlet :(

      http://xmlstar.sourceforge.net

      http://www.videlibri.de/xidel.html

    • Animats 9 years ago

      I have an RSS client written in Rust that builds as a command line program.[1] I wrote this in 2015, and it needs to be modernized and made a library crate, but it will build and run with the current Rust environment. It's not that hard to parse XML in Rust. Most of the code volume is error handling.

      [1] https://github.com/John-Nagle/rust-rssclient

    • sillysaurus3 9 years ago

      Surely there's an xml->json converter somewhere.

      • duskwuff 9 years ago

        It's kind of tough to convert XML directly to other formats (including, but not limited to, JSON), because there are a lot of XML features that don't map cleanly onto JSON, such as:

        • Text nodes (especially whitespace text nodes)

        • Comments

        • Attributes vs. child nodes

        • Ordering of child nodes

        • eponeponepon 9 years ago

          As it happens, XSLT 3.0 and XPath 3.0 both have well documented and stable features for doing exactly this. Roundtripping XML to JSON and back is a solved problem - check it out some time; it may surprise you.

          • ajanuary 9 years ago

            Are you talking about json-to-xml and xml-to-json?

            From the XSLT spec [0]:

            "Converts an XML tree, whose format corresponds to the XML representation of JSON defined in this specification, into a string conforming to the JSON grammar"

            It can't take an arbitrary XML document and turn it into JSON, it can only take XML documents that conform to a specific format.

            You can safely round-trip from JSON to XML and back to JSON. That's trivial because JSONs feature set is a subset of XMLs.

            What you can't safely do is round-trip from arbitrary XML to JSON and back to XML. That's because, as the parent said, there are features in XML that don't exist in JSON. That means you are forced to find a way to encode it using the features you do have, but then you can't tell your encoding apart from valid values.

            [0] https://www.w3.org/TR/xslt-30/#func-xml-to-json

            • duskwuff 9 years ago

              You could conceivably serialize the DOM as a JSON object, but the representation would be very difficult to work with:

                  {
                    "type": "element",
                    "name": "blink",
                    "attributes": {
                      "foo": "bar"
                    },
                    "children": [
                      {
                        "type": "text",
                        "content": "example text"
                      }
                    ]
                  }
  • matthewaveryusa 9 years ago

    Once you've peeked at the complexity of some of the xml parsers (like xerces, oh god xerces) undoubtedly you'll want to avoid it like the plague. xml can get crazy-bananas very quickly. I fundamentally don't understand xml (just like I don't understand asn1) for anything beyond historical purposes.

  • oefrha 9 years ago

    Yep, we don't really need another syndication format that no reader is going to support or support well for years. All I see missing in RFC 4287 is the lack of a per-entry cover image/thumbnail, which you can solve with an extension (which no one supports, and that's kind of the point) anyway.

  • revelation 9 years ago

    Yeah this is great, now instead of properly machine-readable and verifiable XSD files we have pseudo-RFC text on some shitty GitHub page.

  • robgibbons 9 years ago

    JSON, given the same schema, will always be more efficient byte-for-byte than XML. In addition, JSON as a format is native to JavaScript, which itself is ubiquitous. That's not even mentioning raw readability/writability.

    Basically, XML is to JSON as SOAP is to REST. It had it's day, though it's obviously still useful, but we have better tools now. Frankly, I'm surprised we haven't seen a proposal like this sooner.

    • stephenr 9 years ago

      > XML is to JSON as SOAP is to REST

      That's true. Both XML and SOAP are well defined, and well structured.

      JSON and REST are both marginally defined, and thus we see constant incompatible/incomplete implementations, or weird hacks to overcome the shortcomings.

      > we have better tools now

      I think "the cool kids are cargo-culting something newer now" is probably more accurate.

      • icebraining 9 years ago

        Nitpick: REST is very well defined. It's not just a protocol, like some people insist.

        Other than that, fully in agreement.

        • stephenr 9 years ago

          Rest is effectively a concept, and its up to developers to follow the rules it sets.

          You can't take your codebase, add some glue code to a REST module, and know that it will be usable by any other REST consumer/client, because no one follows the guidelines exactly the same way.

  • liuyanghejerry 9 years ago

    Part of me is also with you - JSON is indeed smaller than XML , but we do have gzip almost everywhere around the web, and with gzip, they don't have that much difference on space. Also, if people really care about this, why don't they use binary format, such as something like protobuf?

    And the other part of me is not with you - manipulating XML is not as easy as JSON in most of my development time, and sometimes I even need to write something by my bare hands, which JSON is much more handy. Tons of other formats are more human-friendly than JSON, for example TOML, but they don't have the status JSON has. So I guess JSON is kinda choice under the current state of "web development times".

  • hyperpallium 9 years ago

    In practice, json is much easier to work with on the command line because of jq.

  • wcummings 9 years ago

    No, we don't. This doesn't do anything except break compatibility.

  • weberc2 9 years ago

    Yikes, you didn't even make it to the second sentence.

    > JSON is simpler to read and write, and it’s less prone to bugs.

    • mindcrime 9 years ago

      JSON is simpler to read and write, and it’s less prone to bugs.

      I don't actually find either of those things to be true.

      • jack9 9 years ago

        I simply think you're lying to yourself. It's both literally and theoretically simpler to write and digest let's start with the simplest case, {}. Prone to bugs is a matter of debate, depending on a number of factors.

      • weberc2 9 years ago

        That's a fine opinion to have, but that doesn't mean that people (the authors or devs generally) use JSON out of vanity. As an aside, you're the first person I've heard suggest people put their identity in serialization format, which gave me a good laugh.

    • jwilk 9 years ago

      From the HN guidelines:

      Please don't insinuate that someone hasn't read an article.

      • weberc2 9 years ago

        My mistake for phrasing my point in a manner that violates HN guidelines. I tried to edit, but I missed the window. At any rate, my point stands.

  • metheus 9 years ago

    We don't prefer JSON to XML for any reason other than that XML is terrible by comparison.

    • dangerlibrary 9 years ago

      It's funny to me that at the same time people are flocking to languages with strong, flexible type systems (often with compile-time checks), we are fleeing from a strongly typed data interchange format in favor of a dynamic bag of objects and arrays.

      • algesten 9 years ago

        I think that's because even if the data interchange format is strongly typed, as a consumer you often still must expect _anything_.

        I've yet to work on a project that handles XML where we have a XSD prevalidation step that makes the reading of some deeply nested XML tag feel safe.

        Unless we count XML <-> data object binding back in the java days. Not sure that felt any better...

        • eropple 9 years ago

          On the flip side, I've only ever not had an XSD when I was building something myself and actively didn't care.

          The truth, I tend to suspect, lies somewhere in between. =)

    • bdr 9 years ago

      That's not a reason.

russellbeattie 9 years ago

For anyone who's tried to write a real-world RSS feed reader, this format does little to solve the big problems the newsfeeds have:

* Badly formed XML? Check. There might be badly formed JSON, but I tend to think it'll be a lot less likely.

* Need to continually poll servers for updates? Miss. Without additions to enable pubsub, or dynamic queries, clients are forced to use HTTP headers to check last updates, then do a delta on the entire feed if there is new or updated content. Also, if you missed 10 updates, and the feed only contains the last 5 items, then you lose information. This is the nature of a document-centric feed meant to be served as a static file. But it's 2017 now, and it's incredibly rare that a feed isn't created dynamically. A new feed spec should incorporate that reality.

* Complete understanding of modern content types besides blog posts? Miss. The last time I went through a huge list of feeds for testing, I found there were over 50 commonly used namespaces and over 300 unique fields used. RSS is used for everything from search results to Twitter posts to Podcasts... It's hard to describe all the different forms of data it can be contain. The reason for this is because the original RSS spec was so minimal (there's like 5 required fields) so everything else has just been bolted on. JSONFeed makes this same mistake.

* An understanding that separate but equal isn't equal. Miss. The thing that http://activitystrea.ms got right was the realization that copying content into a feed just ends up diluting the original content formatting, so instead it just contains metadata and points to the original source URL rather than trying to contain it. If JSONFeed wanted to really create a successor to RSS, it would spec out how to send along formatting information along with the data. It's not impossible - look at what Google did with AMP: They specified a subset of formatting options so that each article can still contain a unique design, but limited the options to increase efficiency and limit bugs/chaos.

This stuff is just off the top of my head. If you're going to make a new feed format in 2017, I'm sorry but copying what came before it and throwing it into JSON just isn't enough.

  • hboon 9 years ago

    FWIW, This is by Manton Reece and Brent Simmons. And Simmons is known (among other things) as the creator of NetNewsWire which has been around for more than 15 years. He does know a bit about Atom and RSS feeds.

    https://en.wikipedia.org/wiki/NetNewsWire

    • nothrabannosir 9 years ago

      Ok, I have no idea who these guys are so forgive me being rude: if they're so good then why did they not address those points? to my eyes, op makes a solid argument. I'd like to know their side of the story.

      • yoz-y 9 years ago

        But they did...

        > Badly formed XML? Check. There might be badly formed JSON, but I tend to think it'll be a lot less likely.

        The problem with XML is mostly that it is a very complex format so the bugs are more probable and there are more pitfalls.

        > Need to continually poll servers for updates? Miss. Without additions to enable pubsub, or dynamic queries ...

        They actually did add tags to enable WebSub (previously called pubsub) so there goes that. For the other concerns, I think it is not the formats job to care for partial or incomplete data. Nothing prevents you to have a dynamic link with a "updatesSince" on your webpage and serve all of the articles that were added or updated after that. Nowhere, the format specifies the limit on number of items. It also incorporates paging out of the box so you could bubble up any old articles.

        > Complete understanding of modern content types besides blog posts? Miss.

        The point of this is for the open web, by definition nobody can anticipate all formats. Rather than fill the spec with tweets, facebook and other types, they have opted for the least common denominator and added a specific way to add extensions. This makes way more sense.

        * An understanding that separate but equal isn't equal. Miss.

        Nothing actually prevents you to leave the content fields blank and rely on the reader to pull the format. But for this kind of usage there are other methods. Personally I prefer content delivered in the RSS precisely to avoid to have to deal with customization of content formatting. JSON feed HAS a way to specify formatting though, it's called HTML tags. No need to reinvent the wheel here.

        • russellbeattie 9 years ago

          I don't agree with most of what you wrote, but the "it's called HTML tags" is the most wrong. You must not have tried this any time in the past 5 years or so. The embedded tags come out of CMSs and - when they're not stripped completely - look like <div class="title-main-sub-1"> and <span class="sub-article-v5-bld">. HTML isn't used alone, it's always used with CSS nowadays, and no matter if semantic tags are best practice, the fact is it's optional and regularly not used. If they're going to create a new standard format, they need to address this.

          • yoz-y 9 years ago

            What is the difference between re-publishing the content in some other format which will do formatting well and re-publishing the content using sensible html tags with maybe some embedded minimal stylesheet?

            There might be mis-use and abuse, but if you want to avoid that you can always push markdown into the "text" representation.

    • toyg 9 years ago

      One has to wonder whether Simmons is just trying to revive the old RSS ecosystem. "What do developers like these days, JSON? Let's do RSS in JSON!" ... This does not help.

      The real challenge these days is to replicate the solutions Facebook and Twitter brought to feeds (bidirectionality and data-retention in particular) in a decentralised manner that could actually become popular. Simply replicating RSS in the data-format du jour is not going to achieve that.

  • lucideer 9 years ago

    > Need to continually poll servers for updates? Miss. Without additions to enable pub sub, or dynamic queries, clients are forced to use HTTP headers to check last updates, then do a delta on the entire feed if there is new or updated content.

    This is backwards, imo. The advantage of polling over pub sub is that all complexity is offloaded to the client. This comes with its own set of problems (inefficiency of reinventing the wheel across all clients, plus every client will implement that complexity differently resulting in countless bugs), but this is what drives adoption, which as someone else here has pointed out is all that matters. If you want adoption, you seemingly need to sacrifice a lot of efficiency in favour of making it stupidly easy to publish.

    The "it's 2017 now" argument doesn't really address that even with dynamically generated content, you still need every dynamic serverside platform to adopt and implement your spec independently. Static is always easier. (plus with the recent trend towards static sites, "it's 2017 now" actually has the opposite implication).

  • mindcrime 9 years ago

    The thing that http://activitystrea.ms got right was the realization that copying content into a feed just ends up diluting the original content formatting, so instead it just contains metadata and points to the original source URL rather than trying to contain it.

    It's a shame that ActivityStrea.ms hasn't had more uptake. We've added support in our enterprise social network product and think it enables some cool scenarios. But unfortunately too few other products support it these days.

  • derefr 9 years ago

    > Need to continually poll servers for updates? Miss.

    The point of these syndication formats (RSS, Atom, now this) was always to act as the "I'm a static site and webhooks don't exist, so poll me" equivalent of webhooks. These "pretending to be webooks" were supposed to hook into a whole ecosystem of syndication middleware that turned the feeds into things like emails.

    And that—the output-products of the middleware—was what people were supposed to consume, and what sites were meant to offer people to consume. The feed, as originally designed, was not intended for client consumption. That's why the whole model we have today, where millions of "feed-reader" clients poll these little websites that could never stand up to that load, seems so silly: it wasn't supposed to be the model. RSS feeds were supposed to be a way for static-ish content to "talk to" servers that would do the syndicating for them; not a format for clients to receive notifications in.

    (And we already had a format for clients to receive notifications in: MIME email. There's no reason you can't add another MIME format beyond text/plain and text/html; and there's no reason you can't create an IMAP "feed-reader" that just filters your inbox to display only the messages containing application/rss+xml representations, and set up your regular inbox to filter out those same messages. And some messages would contain both representations, so you'd see e.g. newsletters as both text in your email client and as links in your feed client, and archiving them in one would do the same in the other, since they're the same message.)

    ---

    The big problem I have with feeds (besides that people are using them wrong, as above) is that they have no "control-channel events" to notify a feed-consumer of something like e.g. the feed relocating to a new URL.

    Right now, many feeds I follow just die, never adding a new feed item, and the reason for that is that, unbeknownst to me, the final item in the feed (that I never saw because it rotted away after 30 days, or because I "declared inbox zero" on my feeds, or whatever else) was a text post by the feed's author telling everyone to follow some new feed instead.

    And other authors don't even bother with that; they use a blogging framework that generates RSS, but they're maybe not even aware that it does that for them, so instead they tell e.g. their Twitter followers, or Twitch subscribers, that they're moving to a new website, but their old website just sits there untouched forever-after, never receiving an update to point to the new site which would end up in the RSS feed my reader is subscribed to. It's nonsensical.

    (And don't get me started on the fact that if you follow a Tumblr blog's RSS feed, and the blog author decides to rename their account, that not only kills the feed, but also causes all the permalinks to become invalid, rather than making them redirect... Tumblr isn't alone in this behavior, but Tumblr authors really like renaming their accounts, so you notice it a lot.)

    • ttepasse 9 years ago

      HTTP 301 Moved Permanently is the out of band control channel. Sometimes it even seems to work, depending on software of course.

      There was also a typical Dave-Wineresque invention of replacing the old feed with some special, non-namespaced XML with the redirect: http://www.rssboard.org/redirect-rss-feed

      But of course the real problem is social. As in people simply stop blogging or stop caring. And of course tool developers don't care if someone doesn't want to use their software anymore and don't think of developing the right buttons for this edgecase.

      • derefr 9 years ago

        > HTTP 301 Moved Permanently is the out of band control channel.

        True, but requires you to be able to set response codes on the server. I can't make my Github Pages site, or my Tumblr blog, or my S3 bucket, emit a 301. And those are the sorts of things that RSS was designed for: static sites that can't just, say, tell their backend to email people on update. You'd think that, knowing that, RSS et al would have been designed with in-band, rather than out-of-band, control.

CharlesW 9 years ago

Dave Winer (the creator of RSS) played with this a bit in 2012. It turns out that exact format of feeds doesn't matter nearly as much as there being a more-or-less universal one.

http://scripting.com/stories/2012/09/10/rssInJsonForReal.htm...

  • AceJohnny2 9 years ago

    I'm sure there's...

    oh of course: https://xkcd.com/927/

    (and I realize this doesn't exactly map, as JSON Feed isn't even trying to cover all the usecases of Atom or RSS, just switching the container format)

gedrap 9 years ago

But does it solve any actual problems other than 'XML is not cool', problems big enough to deserve a new format?

It's true that JSON is easier to deal with than XML. But that's relative, there are plenty of decent tools around RSS. From readers, to libraries in the most common programming languages, and extensions in the most common content management systems. JSON is slightly easier to read for human (although that's subjective), but then how often do you need to read the RSS feed manually, unless you are the one who is writing those libraries, etc. But that's a tiny share of all people using RSS.

>>> It reflects the lessons learned from our years of work reading and publishing feeds.

Sounds like the author(s) has extensive experience in this field and knows things better than some random person on the internet (me). But the homepage of the project doesn't convey those learned lessons.

  • tannhaeuser 9 years ago

    Yes JSON is much easier to parse than XML, and is preferred when it fits such as for most Web API requests and responses.

    However, SGML and XML were invented as structured markup languages for authoring of rich text documents by humans, for which JSON is unsuited and sucks just as much as XML sucks for APIs.

    Edit: though XML has its place in many b2b and business-to-government data exchanges (financial and tax reporting, medical data exchange, and many others) where a robust and capable up-front data format specification for complex data is required

zeveb 9 years ago

If we're going to talk about replacing XML with better data formats, why not switch to S-expressions?

    (feed
     (version https://jsonfeed.org/version/1)
     (title "My Example Feed")
     (home-page-url https://example.org)
     (feed-url https://example.org/feed.json)
     (items
      (item (id 2)
            (content-text "This is a second item.")
            (url https://example.org/second-item))
      (item (id 1)
            (content-html "<p>Hello, world!</p>")
            (url https://example.org/initial-post))))
This looks much nicer IMHO than their first example:

    {
        "version": "https://jsonfeed.org/version/1",
        "title": "My Example Feed",
        "home_page_url": "https://example.org/",
        "feed_url": "https://example.org/feed.json",
        "items": [
            {
                "id": "2",
                "content_text": "This is a second item.",
                "url": "https://example.org/second-item"
            },
            {
                "id": "1",
                "content_html": "<p>Hello, world!</p>",
                "url": "https://example.org/initial-post"
            }
        ]
    }
  • krapp 9 years ago

    It looks nicer if you happen to like s-expressions. But to me, it's just replacing one flavor of clutter for another. The best reason not to prefer s-expressions to JSON, though, would be simply that one is already natively supported in browsers and the other would need a parser written in a language that already parses JSON.

  • JoelSanchez 9 years ago

    There's EDN, which is to Clojure what JSON is to JS: a format close to the language's way of representing data.

    https://github.com/edn-format/edn

    Example:

    https://github.com/milikicn/activity-stream-example/blob/4db...

    Not S-expression-based, though.

  • draegtun 9 years ago

    JSON was influenced by Rebol, which i feel would provide an even nicer example:

        version: https://jsonfeed.org/version/1
        title: "My Example Feed"
        home-page-url: https://example.org
        feed-url: https://example.org/feed.json
        items: [
            [
                id: 2
                content-text: "This is a second item."
                url: https://example.org/second-item
            ]
            [
                id: 1
                content-html: "<p>Hello, world!</p>"
                url: https://example.org/initial-post
            ]
        ]
  • hajile 9 years ago

    Those two aren't comparable because you cannot distinguish between key:val pairs and lists. You need dotted lists.

    • zeveb 9 years ago

      Nope, one would normally write the code which parses such S-expressions such that the first atom in each list indicates the function to use to parse the rest of the list. So there'd be a FEED-FEED function which knows that a feed may have version, homepage URL &c., and there'd be a FEED-ITEMS function which expects the rest of its list to be items, a FEED-ITEMS-ITEM function which knows about the permissible components of an item &c.

      If you really want to do a hash table, you could represent it as an alist:

          (things
            (key1 val1)
            (key2 val2))
      
      This all works because — whether using JSON, S-expressions or XML — ultimately you need something which can make sense of the parsed data structure. Even using JSON, nothing prevents a client submitting a feed with, say, a homepage URL of {"this": "was a mistake"}; just parsing it as JSON is insufficient to determine if it's valid. Likewise, an S-expression parser can render the example, but it still needs to be validated. One nice advantage of the S-expression example is that there's an obvious place to put all the validation, and an obvious way to turn the parsed S-expression into a valid object.
      • hajile 9 years ago

        In the absolute abstract, you are correct. In the absolute abstract, you could replace parens with significant whitespace and have zero visible syntax.

        In practice, lisp adopted dotted lists 60 years ago and basically every lisp since has used it as one way to represent an associated list. Minimal syntax is better than zero syntax or loads of syntax.

    • Johnny_Brahms 9 years ago

      There is of course sxml, which is more or less well-defined. But that would probably suffer from the same problems as xml, since it is just xml written as s-expressions.

      There is one pretty damn solid SSAX parser by Kiselyov that has been ported to just about every scheme out there. It is interesting since it doesn't do the whole callback thing of most ssax parsers, but is implemented as a tree fold over the xml structure.

  • foxhill 9 years ago

    beauty is in the eye of the beholder - i personally prefer the JSON.

Communitivity 9 years ago

It is worth pointing out that there is a relevant W3C Recommendation "JSON Activity Streams", https://www.w3.org/TR/activitystreams-core/ . I'm not saying JSON Feed is worse, or better. I am saying that I think JSON Feeds adoption requires a detailed comparison between JSONFeed and JSON Activity Streams 2.0.

eric_the_read 9 years ago

A few thoughts on the spec itself:

* In all cases (feed and items), the author field should be an array to allow for feeds with more than one author (for instance, a podcast might want to use this field for each of its hosts, or possibly even guests).

* external_url should probably be an array, too, in case you want to refer to multiple external resources about a specific topic, or in the case of a linkblog or podcast that discusses multiple topics, it could link to each subtopic.

* It might be nice if an item's ID could be enforced to a specific format, even if perhaps only within a single feed. Otherwise it's hard to know how to interpret posts with IDs like "potato", 1, null, "http://cheez.burger/arghlebarghle"

  • derefr 9 years ago

    > a podcast might want to use this field for each of its hosts, or possibly even guests

    I'm going to pretend this is about music artists in a music library, but the logic is exactly the same for podcast hosts:

    You tend to want fields like this to be singular, so that the field can be used in collation (i.e. "sort by artist.")

    If you have multiple artists for a track, usually one can be designated the "primary" artist—the one that people best know, and would expect to find the track listed under when looking through their library. Usually, then, the rest get tacked on in the field in a freeform, maybe comma-and-space delimited fashion. The field isn't a strict strongly-typed references(Person) field, after all; it's just freeform text describing the authorship.

    But as for hosts vs. guests, that's a whole can of worms. Look at the ID3 standard. Even though music library-management programs usually just surface an "Artist" field, you've actually got all of these separate (optional) fields embedded in each track:

    • TCOM: Composer

    • TEXT: Lyricist/Text writer

    • TPE1: Lead performer(s)/Soloist(s)

    • WOAR: Official artist/performer webpage

    • TPE2: Band/orchestra/accompaniment

    • TPE3: Conductor/performer refinement

    • TPE4: Interpreted, remixed, or otherwise modified by

    • TENC: Encoded by

    • WOAS: Official audio source webpage

    • TCOP: Copyright message

    • WPUB: Publishers official webpage

    • TRSN: Internet radio station name

    • TRSO: Internet radio station owner

    • WORS: Official internet radio station homepage

    That gives you separate credits for pretty much the entire composition, production and distribution flow, which usually means that each field only needs one entry.

    Would be great if people used them, wouldn't it? Maybe the semi-standard "A feat. B (C remix)" microformat could be parsed into "[TPE2] feat. [TPE1] ([TPE4])"...

  • smilbandit 9 years ago

    I was thinking that all the urls and images should have been in arrays.

    • eric_the_read 9 years ago

      Probably, but I think the goal there is to have something that you can display on a summary page with a list of items or episodes, where there's just an icon for each (and a banner image for the background or header or some such), for which purpose I think a single image is fine (I totally get your wanting more than one, though, and I'm happy to be wrong here).

jerf 9 years ago

I would suggest specifying titles as html, not plain text. I've seen too many things titled "I <i>love</i> science!" over the years to believe in the idea that titles are plain text.

Also, despite the fact this is technically not the responsibility of the spec itself, I would strongly suggest some words on the implications of the fact that the HTML fields are indeed HTML and the wisdom of passing them through some sort of HTML filter before displaying them.

In fact that's also part of why I suggest going ahead and letting titles contain HTML. All HTML is going to need to be filtered anyhow, and it's OK for clients to filter titles to a smaller valid tag list, or even filter out all tags. Suggesting (but not mandating) a very basic list of tags for that field might be a good compromise.

  • ergothus 9 years ago

    Allowing HTML means the other side will have to validate that HTML (to avoid XSS). Using text means you can stick in the DOM using innerText() and be much more confident that you aren't injected XSS.

    I agree that I see HTML in RSS titles, but I rather have the occasional garbled title that the author can fix by striping out HTML before the RSS than ensuring that every RSS reader isn't opening up new security holes.

    • jerf 9 years ago

      There is no way to avoid having to handle HTML safely. There's no point in trying to limit your exposure to that problem when the entire point of this standard is to ship around arbitrary HTML for interfaces to display. Once you've solved the hard problem of displaying the body safely, displaying the title is trivial. Making the title pure text does nothing useful. JSONFeed display mechanisms that are going to get this wrong are going to do things like leave injections in the date fields anyhow.

  • ianburrell 9 years ago

    Following the separation of content_text and content_html attributes, it would make sense to have title_html and title_text attributes.

jawns 9 years ago

> It's at version 1, which may be the only version ever needed.

Wow. Now that's confidence. Have you ever read the first version of a spec and thought, "That's just perfect. Any additional changes would just be a disappointment compared with the original"?

  • Johnny_Brahms 9 years ago

    MIDI 1.0 is maybe not perfect, but it is still unchanged since 1983. People have tried to replace it for 2 decades, but failed to provide any enhancements worth a switch.

    But MIDI doesn't really fit that description since it builds on 2 years of work by Roland. My best bet though.

  • vanderZwan 9 years ago

    In all fairness, they're taking a more or less solved problem (feeds), so they don't really have to figure things out there, and they're porting this established solution to a very well-established technology (JSON), so also don't really have to figure stuff out in that sense either.

    As far as scenarios where it's feasible to get the answer right the first time go, this is a reasonably realistic one.

    EDIT: Also, if you scroll to the bottom of the page you can see they have let a whole bunch of people look at the spec before releasing it, so there has been at least some peer review.

  • efsavage 9 years ago

    Unsurprising as this is clearly an ego play, given that the first thing they want you to know is their names.

  • smacktoward 9 years ago

    "Now it belongs to the ages!"

pimlottc 9 years ago

> JSON is simpler to read and write, and it’s less prone to bugs.

Less prone to bugs? How's that?

  • bastawhiz 9 years ago

    Consider XML entity bombs. You need to explicitly tell your XML parser not to follow the spec to prevent malicious sources of XML from crashing your application. XML also has a lot of room for syntax errors, with many types of tokens and escape rules. JSON, by comparison, does not.

    • jjawssd 9 years ago

      > XML also has a lot of room for syntax errors, with many types of tokens and escape rules. JSON, by comparison, does not.

      Parsing JSON is a minefield.

      Yellow and light blue boxes highlight the worst situations for applications using the specified parser. Take a look at how a bunch of parsers perform with various payloads: http://seriot.ch/json/pruned_results.png

      "JSON is the de facto standard when it comes to (un)serialising and exchanging data in web and mobile programming. But how well do you really know JSON? We'll read the specifications and write test cases together. We'll test common JSON libraries against our test cases. I'll show that JSON is not the easy, idealised format as many do believe. Indeed, I did not find two libraries that exhibit the very same behaviour. Moreover, I found that edge cases and maliciously crafted payloads can cause bugs, crashes and denial of services, mainly because JSON libraries rely on specifications that have evolved over time and that left many details loosely specified or not specified at all."

      More details available at: http://seriot.ch/parsing_json.php

      • floatboth 9 years ago

        None of these issues are as bad as the XML ones. You generally don't need "defusedjson" like you need https://pypi.python.org/pypi/defusedxml

        <!DOCTYPE external [ <!ENTITY ee SYSTEM "file:///etc/ssh/ssh_host_ed25519_key"> ]> <root>&ee;</root>

      • bastawhiz 9 years ago

        Parser correctness is irrelevant when you're talking about the ability to be written with few syntax errors. For instance, JSON has one type of string with one set of string escape rules. XML has element names, attribute names, attribute values, text nodes, CDATA content, RCDATA content, and more. And almost all of them have different rules for what they can contain and how they can be used.

        By comparison, XML is orders of magnitude more complex than JSON.

    • josteink 9 years ago

      > XML also has a lot of room for syntax errors,

      No it doesn't. XML is either well formed or not, and any parser encountering non well-formed XML will reject it outright.

      Therefor all XML in use on the internet is spec-compliant.

      Now try to say the same about JSON.

      • bastawhiz 9 years ago

        > any parser encountering non well-formed XML will reject it outright.

        Ah, I see you're new to parsing XML.

        • Aeolun 9 years ago

          Oh, it will be rejected alright. And then you're forced to override the parser, or to manipulate the XML before parsing it because it makes business sense to not have the source fix their XML for some reason.

          People and machines are just utterly incapable of outputting valid XML.

  • crdoconnor 9 years ago

    JSON parsers have a much smaller 'feature surface' meaning that there are fewer nooks and crannies for bugs to live in.

    One example of a bug that often festered in XML parsers: https://en.wikipedia.org/wiki/Billion_laughs (there is no JSON equivalent of this)

    The generalized theory, for those interested : https://en.wikipedia.org/wiki/Rule_of_least_power

  • jonknee 9 years ago

    Probably this part:

    > simpler to read and write

    • halloij 9 years ago

      If you're writing these things by hand, you're probably doing something wrong...

      • PeterisP 9 years ago

        Deserializing somebody else's XML to some usable internal data structures generally requires writing serialization/deserialization by hand and it is always a pain in the ass. On the other hand, JSON basic structures map to reasonable internal representations, so I often can simply iterate through the structures coming as-is from the parser library.

        I mean, if the same webservice is offering the same data in both XML and JSON format, chances are I'd have to write less code for handling the JSON endpoint. For a client written in e.g. Java both cases may be pretty much equal, but for dynamic languages like Javascript or Python, the difference is significant.

      • ralmeida 9 years ago

        This is a straw man, IMO. Obviously, in production, the actual JSONs will interact very little with humans. But there's still development, debugging, etc.

        So you will need to write small cases during development, tweak existing cases, etc.

        Also, many tools accept configuration in JSON, which is somewhat convenient to write by hand, and is easily machine readable. Sublime Text comes to mind, for example.

      • jonknee 9 years ago

        JSON is also easier for computers to read and write...

        • halloij 9 years ago

          XML generators and parsers have been in use for a decade+. Pretty sure most of the bugs have been found and fixed by now.

          It's just reinventing the wheel because the new generation don't want to use the same tools the previous generation did. The time and effort spent doing this is quite ridiculous.

          (FWIW, I hate XML, JSON is far better. But there's more important things to work on).

          • ergothus 9 years ago

            > Pretty sure most of the bugs have been found and fixed by now.

            Given the complexity and what I've seen from some other long established codebases, I don't share your confidence.

            > It's just reinventing the wheel because the new generation don't want to use the same tools the previous generation did.

            You can disagree with the decisions involved (as you did with the XML vulnerability argument), but the fact that those arguments exist means they AREN'T doing it just because they don't want to use the same tools the previous generation did - they have different reasons that you think aren't good reasons.

            Saying it as you did comes across as smug and dismissive, which is not an effective way of convincing your audience that you've taken arguments into account when making your decision.

  • skybrian 9 years ago

    RSS is sometimes ambiguous and there's a lot of variation. It can be hard to parse correctly. Not sure about Atom, though.

    • CharlesW 9 years ago

      > RSS is sometimes ambiguous and there's a lot of variation.

      I've written a reasonably-popular podcast feed validator, and I don't understand either of these criticisms. Mind elaborating?

      • nmcfarl 9 years ago

        Not the parent but my company consumed a bit of RSS starting in 2005 (and with the amounts declining to 0 through the years).

        Over time we've been fed feeds with character encodings not matching what the web server nor the XML declared. Use of undeclared XML namespaces, or quite popular: using elements from other namespaces, without namespaces or declarations -- just shove some nice iTunes things or Atom things into the RSS. Also invalid XML -- just skipping the closing tags was popular.

        These feeds were from paying customers, and we were not the primary consumers - so when we complained they would generally point to someone else who was consuming it without problem. Sometimes we'd point them at a validator, if they were a small enough customer -- but mostly we just kept working on our in house RSS feed reader that could read tag soup.

        Things did massively improve over time, and that by the end we were getting _mainly_ reasonably valid RSS.

      • thousande 9 years ago

        Not been writing XML parsers, but I remember Nick Bradbury the creator of the FeedDemon fame wrote about it a lot 'back in the days',

        * https://nickbradbury.com/2006/09/21/fixing_funky_fe_1/

        * http://nick.typepad.com/blog/2004/01/feeddemon_and_w.html

        * https://en.wikipedia.org/wiki/FeedDemon

      • skybrian 9 years ago

        Since you've done it recently, I'm sure you know more than I do; I suspect my knowledge of it is obsolete.

      • bastawhiz 9 years ago

        > I've written a reasonably-popular podcast feed validator

        Mind sharing?

  • StevePerkins 9 years ago

    I couldn't help but take a dismissive stance toward the rest of the page after reading the first paragraph.

ttepasse 9 years ago

Shortly after RSS 0.9 came out RSS 1.0 reformulated the RSS vocabulary in RDF terms. Of course the modern (sane) successor to RDF/XML is JSON-LD.

So I'm hoping for JSON-LD Feed 1.1 and a new war of format battles. Maybe we can even get Mark Pilgrim out of hiding!

  • toyg 9 years ago

    Someone should open a social network for feed-wars veterans.

    More seriously, it's sad so to see that almost 20 years later, the dream of a decentralised and bidirectional web is in even worse shape than it was back then.

    • bullen 9 years ago

      Yes, extend this to JSON pingback and bring back the decentralized social web.

einrealist 9 years ago

If you create a new JSON-based document format, please consider to use JSON-LD (aside raw JSON data) so we can make a true world of interconnected data through semantic formats. At least, so I can generate code and automatically validate format compatibility from a well-defined schema. Thank you!

EDIT: Because I get downvoted despite stating my opinion on the topic, I adjusted the statement.

gwu78 9 years ago

Is this a "JSON Feed" from NYTimes?

Example below filters out all URLs for a specific section of the paper.

   test $# = 1 ||exec echo usage: $0 section

   curl -o 1.json https://static01.nyt.com/services/json/sectionfronts/$1/index.jsonp
   exec sed '/\"guid\" :/!d;s/\",//;s/.*\"//' 1.json
I guess SpiderBytes could be used for older articles?

Personally, I think a protocol like netstrings/bencode is better than JSON because it better respects the memory resources of the user's computer.

Every proposed protocol will have tradeoffs.

To me, RAM is sacred. I can "parse" netstrings in one pass but I have been unable to do this with a state machine for JSON. I have to arbitrarily limit the number of states or risk a crash. As easy as it is to exhaust a user's available RAM with Javascript so too can this be done with JSON. Indeed they go well together.

pedalpete 9 years ago

"JSON has become the developers’ choice for APIs", I'm curious about how people feel about this statement from a creation vs consumption perspective.

I'm currently creating an API where I'm asking devs to post JSON rather than a bunch of separate parameters, but I haven't seen this done in other APIs (if you have, can you point me to a few examples?). I'm curious what others thoughts are on this. It seems that with GraphQl, we're maybe starting to move in this direction.

smilbandit 9 years ago

I'd like to see a language available at the item level. You can derive the language from the http headers but if you're dealing with linkblogs it would be nice at the item level to help with filtering.

I think that images and urls would do well as order lists rather than as individual values. at the top level you have 3 urls and an array for hubs. with type and url you could have an array for hubs and the urls. same could be done for images at the top level and both again at the item level.

niftich 9 years ago

It's unfortunate that XML has fallen so out of favor that well-made, strongly-schemad formats specified in XML, like Atom, are suffering in turn -- although reasons for feeds' demise go well beyond its forms-on-the-wire. This trend frustrates me, but it's undeniable that a lot of web data interchange happens with JSON-based formats nowadays, and the benefits of network effects, familiarity, and tooling support make JSONification worth exploring.

But even more frustrating is when a format comes out that's close to being a faithful translation of an established format, but makes small, incompatible changes that push the burden of faithful translation onto content authors, or the makers of third-party libraries.

I honestly don't intend to offer harsh targeted critique against the authors -- I assume good faith; more just voicing exasperation. There have been similar attempts over the years -- one from Dave Winer, the creator of RSS 0.92 and RSS 2.0, called RSS.js [1], which stoked some interest at first [2]; others by devs working in isolation without seeming access to a search engine and completely unaware of prior art; some who are just trying something unrelated and accidentally produce something usable [3]; finally, this question pops up from time to time on forums where people with an interest in this subject tend to congregate [4]. Meanwhile, real standards-bodies are off doing stuff that reframes the problem entirely [5] -- which seems out-of-touch at first, but I'd argue provides a better approach than similar-but-not-entirely-compatible riff on something really old.

And as a meta, "people who use JSON-based formats", as a loose aggregate, have a serious and latent disagreement about whether data should have a schema or even a formal spec. In the beginning when people first started using JSON instead of XML, it was done in a schemaless way, and making sense of it was strictly best-effort on part of the receiving party. Then a movement appeared to bring schemas to JSON, which went against the original reason for using JSON in the first place, and now we're stuck with the two camps playing in the same sandbox whose views, use-cases, and goals are contradictory. This appears to be a "classic" loose JSON format, not a strictly-schemad JSON format, not even bothering to declare its own mediatype. This invites criticism from the other camp, yet the authors are clearly not playing in that arena. What's the long-term solution here?

[1] http://scripting.com/stories/2012/09/10/rssInJsonForReal.htm... [2] https://core.trac.wordpress.org/ticket/25639 [3] http://www.giantflyingsaucer.com/blog/?p=3521 [4] https://groups.google.com/forum/#!topic/restful-json/gkaZl3A... [5] https://www.w3.org/TR/activitystreams-core/

0x006A 9 years ago

why is it size_in_bytes and duration_in_seconds as opposed to content_text and content_html

It should just be size and duration or size_bytes size_seconds (but adding units only makes sense if you could use other units). adding _in to the mix is strange.

gumby 9 years ago

A good announcement explains what problem it is intending to solve.

bullen 9 years ago

I miss the distributed social pingback days!

Implemented: http://sprout.rupy.se/feed?json

voidfiles 9 years ago

This seems like a great idea. If it can help even one developer it's worth it.

  • CharlesW 9 years ago

    How would it help even one developer?

    Or asked another way, what problem does this solve for you?

    • voidfiles 9 years ago

      So, my personal blog doesn't get a ton of traffic, but the one article that gets the most traffic is an article about how to monkeypatch feedparser to not strip about embedded videos.

      While not hard evidence, I think it's indicative of the kind of experience a developer has when they choose to engage with syndication.

cocktailpeanuts 9 years ago

Doesn't Wordpress already have something like this? http://v2.wp-api.org/

I don't understand why suddenly people treat this like something that uniquely solves a problem. Maybe I'm missing something?

  • yoz-y 9 years ago

    This format is more akin to RSS than to a programmatic rest API. The main goal is to be able to avoid the pitfalls of parsing Atom and RSS feeds. Both Brent Simmons and Manton Reece are quite active in making decentralized alternatives for self publishing for which RSS is the current backbone.

    • donohoe 9 years ago

      Parsing RSS and Atom feeds is a solved problem, no?

      • yoz-y 9 years ago

        JSON Feed is a new solution for the problem already solved by RSS or Atom. It makes it easier to develop new publishers and consumers. It also tackles the main problems with these two formats, e.g.: no realtime subscriptions, mandatory titles which are a pain for microblogs, potential security problems with XML and so on.

        Like somebody somewhere has written: If no one had ever reinvented the real wheel - our cars would be rolling around on big wooden logs

ozten 9 years ago

XML is aweful, but it does have CDATA, which lets you embed blog posts directly and it's easy to debug.

String encoded blog posts are going to be painful once people start using the `content_html` part of the spec.

  • __david__ 9 years ago

    Naw, JSON has reasonable quoting in the strings. It's maybe painful to read the raw json, but it encodes just fine.

pswenson 9 years ago

i'm surprised no one has started a snake vs camel case debate here! https://jsonfeed.org/version/1

nilved 9 years ago

Good lord, Web people, stop it. You are embarrassing yourselves. We already have standards and you need to stop recreating everything in JavaScript.

  • frou_dh 9 years ago

    Brent Simmons is hardly some webdev kid barging it. He was the original developer of NetNewsWire, a very popular/influential feed reader application which is now 15(!) years old.

systematical 9 years ago

Who uses feeds? Who uses XML?

ehosca 9 years ago

stopped reading after "JSON is simpler to read and write, and it’s less prone to bugs." ....

donohoe 9 years ago

I have grave concerns that this publishing format is delivered to us by two people that, as far as I can see, have limited to zero publishing background.

That said, they're being responsive to questions in Issues, so I remain optimistic.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection