Settings

Theme

Buf raises $93M to deprecate REST/JSON

buf.build

56 points by jbrandhorst 4 years ago · 90 comments

Reader

ChuckMcM 4 years ago

Wow, “Oh look, we’re going to re-invent the wheel for what is at least the third time.”

When I worked at Google I sat down one day across from one of the gRPC engineering leads who was talking about the things they were doing for the then current generation of gRPC. I asked if I could ask some questions about it and they agreed and then dissected their design in half a dozen ways that would fail in both non-important but irritating ways, and in critical ways at scale. They were amazed at I had thought about this topic so deeply as it was all “state of the art” and I was, nominally, “old.” I pointed out that I had been the ONC RPC architect at Sun in the ‘80s during the first RPC wars and while the implementations had change, fundamentally messaging as a form of procedure call has some fundamentally bad properties. These challenges manifest in all aspects of RPC, from marshaling data, to rendezvousing, to delivery reliability and guarantees. Andy Birrell at DEC SRC and Leslie Lamport had written dozens of papers looking at these challenges in small systems and large. There was literally decades of solid research that the engineer in front of me at the cafeteria that day was re-discovering from first principles.

RPC protocols from Sun, DEC, Microsoft, OSI, SOAP, the IETF, and the Open Group have run at this problem again and come up with different solutions that each have their own set of warts. Good for some things, not great for others. But at this point there are enough options at this point.

What is missing from Buf’s material is what I might call the “Chesterson’s fence” material that dives into why all of these previous versions were insufficient and how their new version of gRPC will solve all those problems without adding new wrinkles.

I think it is great that they are trying to improve the state of the art, I would feel better about it if they also demonstrated they understood what had come before.

  • kentonv 4 years ago

    Note that Buf is building tooling around Protobuf/gRPC, not replacing them.

    Maybe I'm just another clueless millennial developer who doesn't understand the history of the 80's or whatever, but I've never been able to understand this claim that RPC is broken. There's a lot of assertions that everyone knows RPC was broken because smart people in the 80's said so but... no one has ever been able to give me a concrete reason why.

    RPC, at least as I've always known it, really just boils down to request/response protocols. You send a request, you get a response. While this is admittedly not the only possible networking pattern, it is the dominant one across almost all distributed systems I've worked with. HTTP itself is request/response -- it's basically the same thing.

    All gRPC and Protobuf are doing that is different from HTTP is they're using a binary protocol based on explicitly-defined schemas. The protobuf compiler will take your schemas and generate convenient framework code for you, so you don't have to waste your time on boilerplate HTTP and parsing. And the binary encoding is faster and more compact than text-based encoding like JSON or XML. But this is all convenience and optimization, not fundamentally different on a conceptual level.

    Neither HTTP nor RPC protocols have ever pretended to solve higher-level distributed systems concerns of fault tolerance, network partitions, reliable delivery, etc. Those are things you build on top. You need a basic mechanism to send messages before you can do any of that.

    What, exactly, is the magical non-RPC approach of the 80's that we're all missing? Can you explain the alternative?

    EDIT: Also, like, the entirety of Google is built out of services that RPC to each other, but the 80's called and said that's wrong? How am I supposed to take this seriously?

    • elcritch 4 years ago

      Perhaps ChuckMcM means the idea of completely transparent RPC? Treating RPC as something you can throw into a program and have it become distributed without designing for it. There seemed to be a time when it that was an expectation. gRPC still needs a lot of extra scaffolding to make a real distributed system.

    • sergueif 4 years ago

      the term "RPC" can be unpacked with a lot of bad baggage. "remote" can imply unnecessary coupling of knowing the exact receiver. "procedure" can imply a hard guarantee that some side effect took place by the time you get a response.

      "non-RPC", as best I can interpret it, means "broadcasting" useful messages / FYIs without much out-of-band coupling and listening for interesting messages. You don't know who's gonna receive the message, what they'll do with it, "when" they'll act on it.

      RPC is inspired by "procedure call" on a single CPU, which is the complete opposite. in a "procedure call" you know exactly the implementation you're gonna get, when it will be executed, etc.

      You can find glimpses of this in lots of companies, when there's heavy use of a message bus like Kafka. Protobufs as "messages" instead of mere procedure "call" arguments.

      What do you think?

    • ChuckMcM 4 years ago

      The “promise” of RPC has been that you’re calling a procedure from your code that happens to be on a different piece of equipment. It may be in a completely different memory space and on completely different hardware. So “seamless distributed computing.”

      The basic premise being that you specify the interface and you can use tooling to build some skeleton code that makes the code the user writes look like any other code they write, and yet it might magically be running on half a dozen machines.

      Of course the actual difference between invoking a “procedure call” which is simply a program counter change and the same stack you had before, and one where the parameters provided are marshaled into a canonical form so that at the destination you can reliably unmarshal them and correctly interpret them, where the step that had been done by the linker resolving one symbol in your binary is now an active agent that is using yet another protocol at the start of execution to resolve the symbols and plumb the necessary networking code. And the execution itself which may happen exactly as expected, or happen multiple times without you knowing it has done so, or might not happen at all.

      The minimalist camp, of which I consider myself a member, says “No, you can’t make these seamless, they really are just syntactic sugar that lets you specify a network protocol.” In that simple world you acknowledge that, and plan for, any part of the process to fail. Your code had failure checks and exceptions that deal with “at most once” or “at least once” semantics, you write functions rather than procedures to be idempotent when you can to minimize the penalty of trying to maintain the illusion of procedure call semantics in what is in fact a network protocol implementation.

      But there is another camp, and from the material Buf has put out they seem to be in that camp, which is “networking is hard and complicated, but we can make it so that developers don’t need to even know they are going over a network. Just use these tools to describe what you want to do and we’ll do all the rest.”

      My experience is that obfuscating what is going on under the hood to lower the cognitive load on developers breaks down when trying to distribute systems. That is especially true for languages that don’t explicitly allow for it. The number of projects/ideas/companies that have crashed on that reef are numerous.

      And there is this part : “All gRPC and Protobuf are doing that is different from HTTP is they're using a binary protocol based on explicitly-defined schemas. The protobuf compiler will take your schemas and generate convenient framework code for you, so you don't have to waste your time on boilerplate HTTP and parsing. And the binary encoding is faster and more compact than text-based encoding like JSON or XML. But this is all convenience and optimization, not fundamentally different on a conceptual level.”

      I agree 100% with that statement, and that is exactly what ONC RPC does, and that is exactly what ASN.1 does, and that is exactly what DCS does. That same wheel, again and again. So what I was suggesting originally is that Buf should try to explain what they are doing that these other systems failed to do, and in that explanation acknowledge the reasons this wheel has been re-invented so many times before, and then explain how they think they are going to make a more durable solution that lasts for more than a few years.

      • kentonv 4 years ago

        No modern RPC system attempts to transparently emulate a local procedure call. Everyone who is seriously using these systems understands that an RPC is not equivalent to a local call. Everyone understands that the network introduces a host of new failure modes that must be considered, as well as latency and concurrency. RPC systems are used to simplify protocol development but it is understood that these are still protocols and they don't magically solve everything.

        > that is exactly what ONC RPC does, and that is exactly what ASN.1 does, and that is exactly what DCS does.

        Simply put, Protobuf and gRPC do it better. The developer experience is much better. The tooling is much better. The implementation is better-optimized. It's not a new concept, it's just a better implementation. That's all there is to it.

        But anyway, Buf is not re-inventing this wheel, it's just building on Protobuf and gRPC. It seems like your beef is with Protobuf and gRPC, not Buf.

      • xyzzy_plugh 4 years ago

        I don't disagree with anything you're saying but I believe you're missing the forest for the trees. The problem I see Buf and other companies trying to solve isn't RPC so much as IDL. Defining and managing schemas is incredibly painful. It doesn't matter if it's a remote network call, HTTP, binary blob over a socket, files on disk or a call to a local function. Defining, sharing and consuming the boundaries -- the interfaces -- rigorously is the painful part.

        You're too caught up on the implementation details. If you solve the IDL problems, then you can simply change the implementation and no one is the wiser. grpc-gateway is maybe a good example?

        > exactly what ASN.1 does

        Okay I actually do disagree with this. ASN.1 is a whole different ballgame. What doesn't it do, besides the obvious complications leading to buggy and insecure implementations every other day?

        Please don't use ASN.1

      • xtiansimon 4 years ago

        > “So what I was suggesting originally is that Buf should try to explain what they are doing that these other systems failed to do…”

        I was having a discussion online yesterday about writing research papers, and this exact line of argumentation was noted.

        As I recall, the marketing version is called a ‘white paper’.

digitailor 4 years ago

A case study of the pandemic speed of capital:

  May 2020- $1M Pre-Seed
  Sept 2020- $3.7M Seed
  April 2021- $20.7M Series A
  Dec 2021- $68M Series B
How can so many rounds be condensed so quickly for a business like this? Is the number of Homebrew downloads (37k as advertised on the home page) a metric that can lead to an 18 month ramp to Series B now?

I think there's a trend at play here I'd love to hear more about.

  • catsarebetter 4 years ago

    Maybe it's a lot more simple, like they have someone that's ridiculously good at raising venture capital, like the Posthog guys.

    • kentonv 4 years ago

      TBH not really. I've talked to the founder a few times. He doesn't want to put his name on things, as he's kind of a private person and really wants to direct the spotlight at his employees. But he's just an engineer who has spent a lot of time working with Protobufs, not a sales person at all.

      (Disclosure: I was the maintainer of Protobuf who put together the first open source release at Google, and I made a small investment in buf early on.)

      • catsarebetter 4 years ago

        Ok now I'm pretty interested in this company:

        1. Very technical founder at the bleeding edge of the field. 2. Many people against this idea

        Usually a sign there's something really good here... or at least something that is super non-obvious to most people.

        I'm curious to learn what I'm missing... mind if I reach out?

        • kentonv 4 years ago

          I think the negativity here is just because fundraising announcements (especially massive ones) tend to attract that kind of response. Admittedly fundraising announcements are not very interesting to an audience wanting to know what the product actually does. But I think what buf is building is not very controversial, it's just developer tooling that obviously should exist and doesn't for some reason, maybe because Google owns Protobuf but Google doesn't really have a strong incentive to make sure external developers have everything they need here.

          Happy to answer emails but can't guarantee I have anything interesting to say.

        • digitailor 4 years ago

          The exuberant behavior of the VCs is really what's being noted. The core business could very well be sound. One just has to hope the very technical founder is being well advised.

          This is a press release after all

  • felipellrocha 4 years ago

    While the rest of the world is in a depression, Tech is in a bubble due to a ton of money being shifted over here since there was nowhere else to go.

opendomain 4 years ago

I am on the exact opposite end of the spectrum. I have been promoting json ever since Douglas Crockford discovered it. Even my twitter handle is @json.

If someone wants to use json.com to create a company to promote json - DM me.

  • moralestapia 4 years ago

    100% my thoughts as well. I want to help, but how do I get in touch?

    Shoot me an email, mine's on my HN profile.

    • scrollaway 4 years ago

      > how do I get in touch?

      If I'm deciphering the parent's comment correctly, probably with an HTTP POST to https://json.com/json {"json": "I love JSON!"}

      --- btw @opendomain: the twitter handle on the site is outdated.

hn_throwaway_99 4 years ago

When shit like this happens, I just always think "I don't understand finance at all and never will."

I read the company's primary blog blog post, https://buf.build/blog/api-design-is-stuck-in-the-past, about "schema driven development" and agree with a lot of it. Which is why I'm a huge fan of GraphQL and related completely free open source libraries, where I define my API endpoints with a strongly typed yet easily evolvable schema, and auto-generate my Typescript types from my GraphQL definitions.

$93 million dollars is just nuts to me.

  • anonymouse008 4 years ago

    I was just about to ask ‘what are the differences between this and GraphQL?’ Maybe tag on an AWS AppSync & DynamoDB, and you have pretty much all of this?

    And before anyone goes all ‘but what about Dropbox?’ when scrutinizing this idea... Dropbox was never really made for technical people, this is squarely at people who know what JSON means, so technical.

  • xyzzy_plugh 4 years ago

    > Which is why I'm a huge fan of GraphQL and related completely free open source libraries

    I don't understand this comparison. Apollo raised $130M this past summer -- doesn't seem that different to TFA. Is that also nuts to you?

    The Protocol Buffers and gRPC ecosystem are also completely free open source libraries. Replace GraphQL with Protobuf and your post is still correct.

    • hn_throwaway_99 4 years ago

      > Apollo raised $130M this past summer -- doesn't seem that different to TFA. Is that also nuts to you?

      Yes, absolutely. I love the Apollo open source libs, and I can currently see how many customers would choose to pay for their services, but yes, I think $130 million is also nuts.

      Note I did preface my comment with "I don't understand finance at all and never will." so I'm certainly not saying I'm right here.

  • webinvest 4 years ago

    At some point that money will have to be paid back — with interest… but from where? It can’t be free forever.

selfhoster11 4 years ago

Lol. Imagine trying to capture a market that is already mostly happy with what it's got, and for free.

  • gravypod 4 years ago

    Imagine this offering: "Why go through the hassle of generating clients for your service for each language when we can build ergonomic clients automatically" and "Why manually look for breaking changes in your API when we can detect them manually" or "We can give you $AMAZING_FEATURE for free by using a clearer language to describe your api" where your feature could be:

    1. Reduced bandwidth ingress 2. Automatic tracing of PII through your system 3. Developer-controlled ops stuff (annotating an RPC as cachable, etc) 4. Automated tracing instrumentation 5. Message streaming (gRPC streams are amazing)

    I can think of a whole host of features that can be built off of protos (I've even built ORMs off of protobuffs for simple things [0]). The value prop is there IMO. HTTP + json APIs are a local minima. The biggest concerns "I want to be able to view the data that is being sent back and forth" is a tooling consideration (curl ... isn't showing you the voltages from the physical layer, it is decoded). Buff is building that tooling.

    [0] - https://github.com/CaperAi/pronto

    • selfhoster11 4 years ago

      > Why manually look for breaking changes in your API when we can detect them manually

      You can't detect all breaking changes automatically. A field can subtly shift semantics on an API level, yet that breaks a workflow for some downstream consumer somewhere.

      OpenAPI and other API description languages give a clear an unambiguous description of an API that can auto-generate clients just fine. Binary JSON/gzipped JSON is frequently very space efficient too. I'm happy to grant the rest of your points, but I cannot see that much value here from an SME perspective. Using tracing and other advanced techniques require the right knowledge to use, and I don't think it's that common in smaller orgs.

      • gravypod 4 years ago

        > You can't detect all breaking changes automatically. A field can subtly shift semantics on an API level, yet that breaks a workflow for some downstream consumer somewhere.

        That is what aip.dev helps with. Following these style guides makes it hard to have ambiguous meaning of a field in an API. Linting APIs is something buff provides.

        This isn't a "catch 100%" thing but this is more close to a "catch >XX% of mistakes" which is good enough if it cheap.

        > OpenAPI and other API description languages give a clear an unambiguous description of an API that can auto-generate clients just fine.

        The last time I used OpenAPI it was very verbose. It wasn't anything like writing code (something I am good at). It was more like writing a large config (something I'm bad at). Protos provide a code-like view on APIs. This is just preference though but I like it.

        > Binary JSON/gzipped JSON is frequently very space efficient too.

        My assumption is that parsing JSON will require more CPU than protos and much more than flatbuffers. More so when you factor in gzip.

        > Using tracing and other advanced techniques require the right knowledge to use, and I don't think it's that common in smaller orgs.

        Taking something that's complex and making it easy for people to use sounds like a business prop. Is tracing something people don't do because there's no value or because it's hard to use?

        Basically: you don't want to use protos for any one reason. It's a whole ecosystem that makes many things magically better.

    • BerislavLopac 4 years ago

      These are all good effects of SDD (schema/spec driven development); but there is nothing in ProtoBufs that make them intrinsically better then other solutions like OpenAPI/JsonSchemas and others.

  • anm89 4 years ago

    I've got a feeling this has to be some kind of enterprise play. Random devs are not going to pay for a data format.

    • travisd 4 years ago

      Their whole thing is tooling around protocol buffers, not the data format itself. The format is completely open source and comes from Google.

      • zozbot234 4 years ago

        Protobuf is not even all that good as a format, flatbuffers has generally better properties.

        • zaphirplane 4 years ago

          My impression is flatbuffers takes more cpu caused it doesn’t transform the binary payload (it’s use case)

  • Guest42 4 years ago

    And to then provide roi on 93m

pixelgeek 4 years ago

Their PR really puts me off. Makes me think the authors are sneering at everyone who uses JSON. And who says they get to deprecate anything?

  • Guest42 4 years ago

    It reminds me a bit of academic papers whereby an author chooses a specific corner-case of a topic and then beats it up in a rather contrived manner.

duxup 4 years ago

The idea of "I'm raising money to get people to stop using REST/JSON" seems kinda weird to me. I get that they have a product and all but the general lead in here seems weird to me.

mbrodersen 4 years ago

Yet another non-business grabbing $ from clueless investors. Surely there must be a word for the “we are grabbing $ from clueless investors” business model? WeWork is the poster child for that one.

  • kentonv 4 years ago

    I mean... I'm one of the investors... I'm also the former maintainer of Protobuf and former startup founder myself... but I could be clueless, yeah.

    • tlackemann 4 years ago

      With all respect, I hope you don't have a lot of capital locked in this. Who is this company targeting? Google?

      My company uses gRPC and it's an absolute nightmare but not so much to the point where we'd use a company like this to add on MORE costs to our infrastructure.

      It baffles me people choose buzzword technology because "ex-googler" or whatever when 99% of companies that choose it will NEVER hit the scale it was meant for. Best of luck to the sales team. They'll be the driving force I'm sure.

      REST is fine for 99% of companies. Long live REST.

      • kentonv 4 years ago

        I don't see Buf's play as being about scale or performance, but rather developer experience. I think that that with the right tooling, the developer experience of strong schemas with code generators can far surpass JSON/REST. If that happens then the performance/scalability benefit is just a bonus.

        Will people pay for it? That's not my area of expertise. But, I would note that Vagrant started out as a collection of Ruby scripts for wrangling existing VM products, which probably few people imagined would be something people would pay for. And just this morning, Hashicorp went public at a market cap of $18.5B. It is possible to build a business around developer tools. Not easy, certainly, but possible.

        > I hope you don't have a lot of capital locked in this

        Like any intelligent angel investor, I always assume I'll lose 100% of my investment and size them appropriately.

        • tlackemann 4 years ago

          I wasn't really talking about Buf at scale but rather how gRPC is just a buzzword technology that companies get sucked into adopting for the sake of "scale". I've yet to see gRPC used in a way that makes sense - it's just added complexity for an org that should've been a monolith to begin with.

          Maybe I'm an old curmudgeon but I don't see the point, at all.

    • mbrodersen 4 years ago

      If you are clueless then you are in good company. Lots of self-proclaimed smart investors loose their shirt every day investing in things that are fundamentally scams pretending to be businesses.

PhoenixReborn 4 years ago

The $93M number in the headline is somewhat misleading, as it's cumulative across all the rounds of funding. Can the title be changed to state that the specific round raised now is a $68M Series B?

Article quote:

> We just closed a $68M Series B co-led By Lux and Tiger Global, with participation from Greenoaks Capital Partners, Lightspeed, Addition, and Haystack.

bfung 4 years ago

I’m not up-to-date with protobuf, but last time I reviewed it like 4 years ago, it still wasn’t consistent at describing schemas as well as avro, esp in describing schema changes.

Good luck to buf.build to sell a revamped wsdl and getting everyone to adopt it.

aogaili 4 years ago

$93M..oh no, soon we will be flooded with ads and Steve Jobs like keynotes on why Rest sucks, I thought we were done with those after GraphQL hype slowed down, I guess not..please give it a...rest.

catsarebetter 4 years ago

Hmm why do they need to raise so much cash every 9 months?

Also does anyone here use them and have any thoughts about their product?

  • kentonv 4 years ago

    > Hmm why do they need to raise so much cash every 9 months?

    Well, they say the best time to take investment is when you don't need it. If you wait until you need it then the terms will be worse. If investors are offering you money when you don't need it, it may be the best time to accept it.

    The sequence of raises here look like a fairly normal sequence for a growing startup, except that they happened much closer together than would be typical. The terms aren't shown but assuming they are in line with a typical sequence then this is a great outcome for buf as it gives them lots of room to build their vision without needing to stress over money for a while.

    > Also does anyone here use them and have any thoughts about their product?

    FWIW, long ago I was the maintainer of Protobuf at Google, including putting together the first open source release. I like what buf is doing -- enough that, full disclosure, I made an angel investment in their seed round.

    There's a huge amount of room for better tooling around Protobuf. Binary and strongly-typed protocols require strong tooling to be usable, but with tooling they can be much better than dynamic and text-based approaches. Like, the fact that the protocol is binary shouldn't make it any harder for a human to read it, because your tools should decode it for you on-demand as easily as you could `cat` a text file. Protobuf historically has had sort of the bare minimum tooling and required a lot of ad hoc copying proto files around between projects to get anywhere, which was a pain. A registry seems like the right first step to making things easier, but I'm really excited about what can be done after that... once you have strong type information, you can have tools to dynamically explore APIs, trace communications, etc.

    • gravypod 4 years ago

      > FWIW, long ago I was the maintainer of Protobuf at Google, including putting together the first open source release. I like what buf is doing -- enough that, full disclosure, I made an angel investment in their seed round.

      I know this might not be the best way to ask but have they considered creating proto rules for Bazel? The existing proto + gRPC story is pretty unfortunate.

    • catsarebetter 4 years ago

      You could even build a marketplace on top of the first layer for devs to build their own tooling...

      Hmm interesting, I need to go read some books, gimme a few weeks to come up with an intelligent response, thanks for your insight

    • dsizzle 4 years ago

      Would there be a free reader? What is their business model?

  • elzbardico 4 years ago

    JSON is a powerful enemy, it takes lots of money to wage war against such a cunning opponent

  • selfhoster11 4 years ago

    I don't even know why they need so much money for what they do.

    Based on their website, they solve the following problems:

    - a central schema registery. Even if that's something you actually want, it's not a problem that requires $93M to solve, or a commercial company to operate

    - communicating schema changes primarily via human-oriented sources like handwritten documentation on emails. I mean sure, if you are a masochist (or a sufficiently inefficient org), you might do just that. The rest of us here in the 21st century can check the schema into a Git repo instead.

    - schema drift on the client end. Tough, this is what happens when you write software. Adding a third party won't help here.

    - dependency management. For APIs? I cannot imagine a single case where that would help and your API isn't already a monstrosity.

    • catsarebetter 4 years ago

      Hmm well I get that we're in an asset bubble with startups, but if this company is creating jobs, then I think it's somewhat reasonable to assume that they are fixing a problem ppl are willing to pay for.

      I do agree with what you're saying from my perspective as an average swe that doesn't do anything highly specialized...

      I wonder though, what would the workflows look like for a swe, product, ops person even, where schema changes get passed around and edited so much, (kind of making some assumptions here, comparing central schema registry to crm use cases), that this would be necessary...

      Wonder what the shape of the problem looks like...

      • ByteJockey 4 years ago

        Is there enough money to recoup investment if they don't get the average engineer churning out crud apps?

        I mean, there probably is, but what kind of market penetration would this require if you are only going after the specialists?

jokethrowaway 4 years ago

We already have schema based validation in tons of different shapes.

A binary format may save some bandwidth and be slightly harder to reverse engineer - at the cost of being easily introspectable out of the box during development.

I don't think there is enough value to sell something.

I hope it gains traction and cargo cuting companies with bored engineers start using them, so hopefully the next company I work with won't have some terribly complicated and unusable graphql but just protobufs.

ghostwriter 4 years ago

https://reasonablypolymorphic.com/blog/protos-are-wrong/inde...

For those who still want / need binary protocols and schemas, look at FlatBuffers or Cap'n Proto instead. At least they are capable of representing domain structures properly.

  • kentonv 4 years ago

    Sorry, I'm the author of Cap'n Proto and I think that article is full of shit.

    My previous commentary: https://news.ycombinator.com/item?id=18190005

    • ghostwriter 4 years ago

      Thanks for Cap'n Proto. It's better than ProtoBuf, but I prefer FlatBuffers even more. I think the article is clearly indicating the issues that a wider community of conventional type systems in their mainstream languages is not fully aware of. And I disagree with your comments. Firstly, I don't like that you are labelling the author of the article as a "PL design theorist who doesn't have a clue" (my interpretation applied):

      > his article appears to be written by a programming language design theorist who, unfortunately, does not understand (or, perhaps, does not value) practical software engineering.

      I'm not the author, but they mention their prior industrial experience with protobufs at Google, among other unnamed places.

      I'm not a PL theorist either, and I see that you don't fully understand the problems of composability, compatibility, and versioning and are too eager to dismiss them based on your prior experience with inferior type systems. And here's why I think it is the case:

      > > This is especially true when it comes to protocols, because in a distributed system, you cannot update both sides of a protocol simultaneously. I have found that type theorists tend to promote "version negotiation" schemes where the two sides agree on one rigid protocol to follow, but this is extremely painful in practice: you end up needing to maintain parallel code paths, leading to ugly and hard-to-test code. Inevitably, developers are pushed towards hacks in order to avoid protocol changes, which makes things worse.

      You are conflating your experience with particular conventional tooling with a general availability of superior type systems and toolings out there. There's a high demand in utilising their properties in protocol designs today, where most of the currently popular protocols are hampering type systems for no good reason (no productivity gain, no performance gain, no resource utilisation gain).

      Version negotiation is not the only option available to a protocol designer. It is possible to use implicit-for-client and explicit-for-developer strategies to schema migration. It is also possible to semi-automate inference of those strategies. Example [1]

      > This seems to miss the point of optional fields. Optional fields are not primarily about nullability but about compatibility. Protobuf's single most important feature is the ability to add new fields over time while maintaining compatibility.

      There are at least two ways to achieve compatibility, and the optional fields that expand a domain type to the least common denominator of all encompassing possibilities is the wrong solution to this. Schema evolution via unions, versioning, and migrations is the proper approach that allows for strict resolution of compatibility issues with a level of granularity (distinct code paths) you like.

      > Real-world practice has also shown that quite often, fields that originally seemed to be "required" turn out to be optional over time, hence the "required considered harmful" manifesto. In practice, you want to declare all fields optional to give yourself maximum flexibility for change.

      This is false. In practice I want a schema versioning and deprecation policies, and not ever-growing domain expansion to the blob of all-optional data.

      > It's that way because the "oneof" pattern long-predates the "oneof" language construct. A "oneof" is actually syntax sugar for a bunch of "optional" fields where exactly one is expected to be filled in.

      this is not true either, and it doesn't matter what pattern predates which other pattern. Tagged unions are neither a language construct nor a syntax sugar, it's a property of Type Algebra where you have union- and product-compositions. Languages that implement Type Algebra don't do it to just add another fancy construct, they do it to benefit from mathematical foundations of these concepts.

      > How do you make this change without breaking compatibility?

      you version it, and migrate over time at your own pace without bothering your clients too often [1]

      [1] https://github.com/typeable/schematic#migrations

      • kentonv 4 years ago

        > I see that you don't fully understand the problems of composability, compatibility, and versioning and are too eager to dismiss them based on your prior experience with inferior type systems.

        > You are conflating your experience with particular conventional tooling with a general availability of superior type systems and toolings out there.

        You literally quoted my project as one of your two examples of superior systems and now you're telling me I don't understand how superior systems work because I have no experience with them?

        • ghostwriter 4 years ago

          These are not mutually exclusive things, as superiority of the systems is a multi-dimensional metric. I quoted cap'n proto as an alternative to protobuf that I would definitely choose over any protobuf, because in my book it does at least a few things better. Namely, the bits related to immutability & zero-copying, and random access. But at the same time I do not like and do not agree with your field optionality stance, as I think it is based on a false premise that a universal optionality is the only viable path towards compatibility. I will cite the original article regarding the matter to clarify this point:

          > protobuffers achieve their promised time-traveling compatibility guarantees by silently doing the wrong thing by default. Of course, the cautious programmer can (and should) write code that performs sanity checks on received protobuffers. But if at every use-site you need to write defensive checks ensuring your data is sane, maybe that just means your deserialization step was too permissive. All you’ve managed to do is decentralize sanity-checking logic from a well-defined boundary and push the responsibility of doing it throughout your entire codebase.

          This approach doesn't free you as a developer from having to maintain multiple code-paths as you claim to be able to avoid in your older comments ("you end up needing to maintain parallel code paths, leading to ugly and hard-to-test code").

          Code paths are still there, they are now intertwined with your business logic as conditional checks on a field presence. At every calling site that utilises the schema.

          That's one of the reasons why I prefer flatbuffers over cap'n proto when I have a choice, and it is the reason why I think that you are not fully aware of the issues that stem from the choices of protobuf and that are clearly manifested in ecosystems that model network communications via advanced type systems.

          In fact, this comment from your linked thread suggests a similar idea - advanced type systems can provide a strict schema negotiation in semi-automated way, at a fraction of the effort required to maintain schemas with all-optional fields - https://news.ycombinator.com/item?id=18201601

          • kentonv 4 years ago

            The "required fields considered harmful" opinion was a hard lesson learned through real experience -- the experience of repeated outages of large, complex systems like Google Search, GMail, etc. Certainly, prior to this experience, everybody assumed required fields were a good idea.

            More abstractly, the hard lesson was: In a large distributed system, the site of use is the only reasonable place to do data validation. If you do it anywhere else, you will create a more brittle system that can't handle changes. The reason is pretty straightforward, but is more of a human reason than a mathematical one: when someone decides to modify a protocol for some new feature, they know they obviously have to modify the code that produces and consumes the protocol in order to implement the feature. But if they have to update a bunch of other places too, that's at best more work, and at worst easily forgotten. It's really important that any part of the system that is just a middleman will be agnostic to the data and pass it through unmodified -- even if the data is based on a newer version of the schema than the middleman is aware of.

            So yes, you actually want the validation to be in your business logic. But you don't want it to complicate that business logic too much. Most of the time, optional fields (with default values) provide the right balance between making changes easy without making code ugly. Sometimes, a more drastic change -- like declaring a new version of the protocol and writing translation layers -- is a good idea, but this is an expensive step that you want to do rarely.

            Now, obviously you don't agree with this. But your arguments sound like they are coming from a place of intuition, not experience. That's fine, intuition is critical to innovation. But you can't go around claiming your intuition is "superior" without proving it out in practice. Intuition is always based on a simplified model in your head, and the real world often doesn't work like you think it will. I assure you you don't know anything I don't, in decades of working on this stuff I've heard all the ideas. The only way to prove yourself right is to actually build systems your way and show success in the field. Of course, there will likely never be a definitive proof that one idea or the other is superior, only anecdotal experience. However, the fact that a large majority of successful distributed systems today are built on Protobuf or a similar model to Protobuf suggests that experience leans heavily in that model's favor.

            • ghostwriter 4 years ago

              > The "required fields considered harmful" opinion was a hard lesson learned through real experience -- the experience of repeated outages of large, complex systems like Google Search, GMail, etc. Certainly, prior to this experience, everybody assumed required fields were a good idea.

              There is a common trait in the systems you mentioned: they generally allow for a permissive representation of a domain data where many of the fields could be omitted or replaced by zero-values / defaults, because most of them, by their nature, have to do with things that are optional and are tolerable to noise and accidental mistakes (percentile precision). How much of A/B test data and user tracking stats do gmail / google search encode and process as protobuf?

              If you compare it to a simulation engine's data stream or a collaborative BIM / CAD model, you will find out that almost everything that travels over a network in these systems is required to be unambigous and strictly consistent at sending and receiving sites. All binary representations of physical relations in these models are not just scalar values that can tolerate a default value assigned by a protocol parser upon receiving a missing field. The scalar values appear at UI rendering / output formatting. But most of the time you deal with relations and equations and you need to be able to differentiate between missing-by-intent and missing-by-mistake cases. Zero-values will not be helpful either, because a zero value itself can be represented in multiple ways, depending on the model being evaluated and the context it's evaluated in, the values can legitimately come in different precisions, units, ratios (descrete vs dense) and so on, and those are not distinct fields, their combinations are often mutually exclusive. This is not the kind of validation you want to delegate to calling sites implemented in different languages and maintained by different teams of different technical capacity to solve the challenge of a proper validation. The invariants and constraints have to be encoded into the protocol, and required fields is a low-level "must have" bit of it.

  • onionisafruit 4 years ago

    That’s a good idea. I bet the creator of cap’n proto would tell us what a bad idea this is. What does kentonv have to say?

    • ghostwriter 4 years ago

      > I bet the creator of cap’n proto would tell us what a bad idea this is

      You can draw your own conclusion based on the provided arguments and some additional exploratory work. Someone else's opinion is good but optional and is not always as insightful as your own discoveries.

    • kentonv 4 years ago

      Heyo

anm89 4 years ago

Nice. so SOAP?

4kelly 4 years ago

I can highly vouch for Bufs protobuf linter and breaking change detector [1]. It’s open source.

IMO. Generating gRPC code in multiple languages is pretty tedious setup and maintain. Buf has the potential to replace / free up a lot of time for a small team of people maintaining this sort of thing in house.

- person who helps manage a protobuf monorepo.

[1] https://docs.buf.build/tour/detect-breaking-changes

Hnrobert42 4 years ago

Raised that much at what valuation?

OJFord 4 years ago

The original, significantly less sensational/PR-y and more appropriately mundane title is 'An update on our fundraising'.

eps 4 years ago

Yay, an XML v2.

rickstanley 4 years ago

The "™" symbol is so small I though it was dirt and tried to remove it. Lol.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection