gRPC: 5 Years Later, Is It Still Worth It?

69 points by sudorandom a year ago · 36 comments

Reader

I've been working on OpenTelemetry for a few years now, and regularly field questions from end-users wondering why data doesn't seem to end up where they want it. I'd say that at least half the time, switching from the gRPC exporter to the HTTP exporter simply solves it.

flashgordon a year ago

Interesting (as someone just getting into otel) - what is causing grpc exporters to have this loss? Is this loss more by using grpc over the internet vs http or even internally within a network/vpc?
- phillipcarter a year ago
  
  The exporters themselves are usually fine, but it's usually something in their own network where gRPC traffic gets dropped somehow. Sometimes it's a load balance, sometimes it's something else.

skrtskrt a year ago

In years of developing pretty complex products with gRPC I don't think we've ever run into an issue or sharp corner.

Having the libraries generating relatively optimized message parsers and server implementations and just throwing middlewares around them, with easy support for deprecating fields, enums, and a bunch of other goodies - all been a huge help and productivity gain. So much can be done by just understanding the gRPC config settings and throwing some bog-standard middlewares around things.

kostyay a year ago

Same experience for me as well. It does what it should well, the clients and servers it generates in Go are pretty good.

danans a year ago

> We wanted to eliminate the need for each engineer to individually install protoc and multiple plugins and to make generating Go or TypeScript assets a single command a developer could execute.

Couldn't you just make protoc part of your project's git repo?

> This approach ensured that everyone on the team was using identical versions of these tools throughout their development process.

While this does enforce using the same version during development, it also introduces possible differences between code built during development and code built for production (if those are separate processes, as they should be in mature software).

Ideally, production builds should be hermetic and their inputs should only come from the committed source code and tools, not externally hosted ones that evolve independently.

Granted, with a tool as stable (and with such strictly defined interfaces) as protoc, perhaps the risk is minimal, but IMO this isn't a generalizable architecture.

kostyay a year ago

We usually commit the generated clients/servers to the repo and expect developers to run the code generation on their machines.
- danans a year ago
  
  What's the advantage of calling a locally running server vs running the codegen directly on the command line?

andrewl-hn a year ago

I haven't done any gRPC programming, but from what I understand it's a modern re-imagining of XML Web Services: you describe the contract and can generate client and server code based on it for different programming languages.

Early on in my career I worked on several projects that used WS-* "stack", and generally it was a very good experience. Once we had a project that was split between two subcontractor teams and each team worked on their portion of a system (a .NET application and an J2EE server) with API based on a common WSDL spec. The two teams (dozens of engineers on each side) worked independently for about a year, and after that they tried to run the two subsystems together, and it was really cool to see the parts "just click". There were some minor issues (like one side expected UTC timestamps and the other were sending localized time) but they took really little time to fix. The fact that two teams were not really talking to each other, were using different languages and libraries, relied on some manual testing through SoapUI and some mocks, and yet the whole thing even run at first attempt was very, very impressive!

WS-* was heavily criticized at the time: the standards and data formats were convoluted, the tooling beyond .NET and JVM was almost non-existent, Sun and Microsoft were not following the standards in their implementations and cared more about the interop with each other than about being standard-compliant. So, ultimately REST and JSON pushed the whole thing away. But I'm really happy to see people trying to replicate what was great about Web Services without making old mistakes, and I wish everyone involved all the best.

Which brings me to my actual question. Since software development history repeats / rhymes with itself every decade or two I now wonder if XML Web Services were not the first iteration of the formula? Was here another popular technology in 70s, 80s, or 90s that used had people describe an RPC contract and then used it to generate client and server glue code for it?

I know that both COM and CORBA used IDL to describe API, but I don't remember any code generation involved.

wahern a year ago

In the 80s there was RPC/XDR (https://en.wikipedia.org/wiki/Sun_RPC, https://en.wikipedia.org/wiki/External_Data_Representation), which was used pretty heavily in the Unix world. On most Unix systems today, including Linux and macOS, you'll find this suite installed. Try `man rpc` and `man xdr`. XDR is what protocols like NFS are based upon.
A contemporaneous competitor to Sun RPC was DCE/RPC (https://en.wikipedia.org/wiki/DCE/RPC), which I think Microsoft's SMB protocol was based upon, albeit in Microsoft's trademark manner--embrace, extend, extinguish.
None of these require compilers, but you're usually better off for them, especially for serialization and deserialization, regardless of whether you're using a specialized library. On Unix there are libraries that can be used in an ad hoc fashion for RPC/XDR, but also rpcgen (`man rpcgen`).
atombender a year ago

Sun RPC — the official name was ONC RPC — was probably the first modern RPC. It's an Internet standard. You've probably used it without realizing it; it's the protocol that NFS uses. If you've ever had to deal with the NFS "portmapper", then that's because of Sun RPC. Some other protocols use it.
It uses XDR as the schema definition language. XDR is basically analogous to Protobuf files. It has structs and tagged unions and so on.
Another technology from around the same time was DCE/RPC [2]. Microsoft adapted wholesale as MSRPC. Windows used it extensively around the time of NT 3.x for protocols like Exchange Server, and I believe it's still in wide use. DCE/RPC has its own IDL. You used the compiler to generate the stub implementations, just like Protobuf/gRPC.
Microsoft COM uses DCE/RPC under the hood, with lots of extensions [3]. CORBA emerged around the same time as DCE/RPC and COM and is roughly analogous in functionality.
COM and CORBA are explicitly object-oriented. While protocols like DCE/RPC and gRPC return values that are pure data, such as primitives and structs, COM and CORBA can return interfaces. An interface pretends to be a local in-memory instance, but its methods are "stubs" that invoke the underlying RPC call to execute them remotely. Methods can also return functions, which means you have whole trees of objects which are remote. Adding to that, both COM and CORBA use reference counting to hold onto objects, so if a client has received a remote object and reference counted it, the server needs to keep it around until the client either releases the refcount, or the client dies. COM and CORBA called this referential transparency, in that any object could be either local or remote, and a consumer of the interface didn't need to know about it. Of course, while nicely magical, this leads to a lot of complexity. I developed a rather complex distributed DCOM application in the late 1990s, and while it did, inexplicably, work quite well, it was also a nightmare to debug and keep stable.
While COM is alive and well in Windows these days (and interestingly enough, some APIs like DirectX use the COM pattern of defining interfaces via IUnknown etc., but are not actually true COM), DCOM turned out to be a mistake, and CORBA failed for some of the same reasons, although for many reasons unique to CORBA as well. CORBA made tons of design mistakes.
SOAP and WS-*, and of course XML-RPC and JSON-RPC, came later. The wheel has been reinvented many times.
[1] https://datatracker.ietf.org/wg/oncrpc/about/
[2] https://en.wikipedia.org/wiki/DCE/RPC
[3] https://learn.microsoft.com/en-us/openspecs/windows_protocol...
- bediger4000 a year ago
  
  DCE RPC was the old Apollo Computer Inc RPC system probably with big fixes and support for 64-bit CPUs. I'm not sure what relationship it has to ONC RPC, peer, ancestor or descendant.
  See: https://jim.rees.org/apollo-archive/papers/ncs.pdf

jauntywundrkind a year ago

It drives me bonkers that HTTP packs so many awesome capabilities that enable so much awesome gRPC stuff.

Then web browsers never implement any of those capabilities.

gRPC-web's roadmap is still a huge pile of workarounds they intend to build. These shouldn't be necessary! https://github.com/grpc/grpc-web/blob/master/doc/roadmap.md

Instead of giving us http-push, everyone said, oh, we haven't figured out how to use it for content delivery well. And we've never ever let anyone else use it for anything. So to got canned.

Http-trailers also seems to not have support, afaik.

Why the web pours so much into HTML, js, CSS, but utterly neglects http, to the degree where grpc-web will probably end up tunneling http-over-webtransport is so cursed.

jrockway a year ago

Things I've learned using gRPC for ~10 years:

1. I really like gRPC and protos for the codegen capabilities. APIs I've worked on that use gRPC have always been really easy to extend. I am always tempted to hand-roll http.Handle("/foo", func(w http.ResponseWriter, req *http.Request) { ... }), but gRPC is even easier than this.

2. grpc-web never worked well for me. It's hard to debug; in the browser's inspector you just have serialized protos instead of JSON, and developers find this hard to debug. Few people know about `protoc --decode[-raw]` options. (This comes up a lot when working with protos; you have binary or base64-encoded protos but just want key/value pairs. I ended up adding a command to our CLI to do this for you.)

I also thought the client side of the equation was a little too bloated. webpack and friends never tree-shook out code we didn't call, and as a result, the client bundle was pretty giant for how simple of an app we had. There are also too many protoc plugins for the frontend, and I feel like whenever a team goes looking for one, they pick the wrong one. I am sure I have picked the wrong one multiple times. After many attempts at my last job, I found the sane Typescript one. But at my current job, a team started using gRPC and picked the other one, which caused them a lot of pain.

3. grpc-gateway works pretty well, though. Like grpc-web, it suffers from promised HTTP features never being implemented, so it can't implement all of gRPC. (gRPC is really just bidirectional RPCs, but you can restrict what you do to have unary RPCs, server streaming RPCs, client streaming RPCs, and bidirectional RPCs. The web really only handles unary and server streaming. grpc-gateway doesn't remove these grpc-web limitations.)

But overall, I like it a lot for REST APIs. If I were building a new REST API from scratch today, it would be gRPC + grpc-gateway. I like protos for specifying how the API works, and grpc-gateway turns it into a Swagger file that normal developers can understand. No complaints whatsoever with any of it. (buf is unnecessary, and I feel like they just PR'd themselves into the documentation to sound like it's required, but honestly if it helps people, good for them. I just have a hand-crafted protoc invocation that works perfectly.)

4. For plain server-to-server communication, you'd expect gRPC to work fine, but you learn that there are middleboxes that still don't support HTTP/2. One problem that we have is that our CLI uses gRPC to talk to our server. Customers self-host all of this, and often work at companies that break gRPC because their middleboxes don't support HTTP/2. (I'll point out here that HTTP/3 is the current version of HTTP.) We have Zscaler at work and this mostly affects our internal customers. (We got acquired and had these conditions added 8 years into the development cycle, so we didn't anticipate them, obviously.) But if we were starting all over today, I'd use grpc-gateway-over-http1.1 instead of grpc-over-http2. The API would adjust accordingly; I wouldn't have bidirectional RPCs, but RPCs that simulate them. Something like a create session RPC, then a unary call to add another message to the session, and a unary call that returns when a message is ready. It sucks, but that's all that HTTP/1.1 in the browser really offers, and that's the maximum web compatibility level that works in Corporate America these days.

5. Some details are really confusing and opaque to end users trying to debug things. Someone set up a proxy. They connected to dns:///example.com and the proxy doesn't work properly. This is because gRPC resolves example.com and dials the returned IP addresses, and sets :authority to the IP addresses and not the hostname. You have to use passthrough:///example.com to have the HTTP machinery make an HTTP request for example.com/foopb/Foo.Method. Maybe this is Go specific, but it always confuses people. A little too many features available out of the box, that again work great on networks you control, but poorly on networks that your employer controls.

kostyay a year ago

Thanks for the write up.
> grpc-web never worked well for me. connectrpc is good alternative for the client side libraries. The code it generates is much better than the one produced by grpc-web (and the 3-4 different plugins on the market for it). it also supports grpc-web encoding out of the box. we are still using grpc-web internally just with connectrpc generated clients. The frontend engineers were extremely happy when we moved from the grpc-web clients to but ones. The DX is much better.
> grpc-gateway works pretty well, though. I think grpc-gateway is a decent choice if you are building REST api for internal use. It does feel like an unfinished product (well.. it is a community effort). Specifically if you decide to follow Google's AIPs guidelines (https://google.aip.dev/general) you may find that some things are not implemented. Another downside of it is that it can't produce OpenAPI v3.
azophy_2 a year ago

this is a very detailed review. thanks for sharing this
citizenpaul a year ago

Very informative thanks

dastbe a year ago

my issue with grpc is that looking at https://github.com/grpc/proposal all of the proposals are XDS-related, and its not clear to me why XDS is being so heavily pushed outside of potential GCP interests re: traffic director. Is there really nothing else to work on here?

jameskilton a year ago

I love protobufs as a type-safe way of defining messages and providing auto-generated clients across languages.

I can't stand gRPC. It's such a Google-developed product and protocol that trying to use it in a simpler system (e.g. everyone else) is frustrating at best, and infuriating at worst. Everything is custom and different than what you're expecting to deal with when at its core, it is still just HTTP.

Something like Twirp (https://github.com/twitchtv/twirp) is so much better. Use existing transports and protocols, then everything else Just Works.

maxmcd a year ago

Twirp is lovely, we kind of hit a wall when using it internally because it doesn't have a streaming story: https://github.com/twitchtv/twirp/issues/3
If you don't need to stream data it is excellent.
Groxx a year ago

Yea, and the code gen (for Go at least) very clearly assumes you're using a monorepo and how dare you think of doing anything else you monster.
E.g. there's a type registry, which means you can't ever have the same proto type compiled by two different configs (it'll panic at import time). In a monorepo that's (potentially) fine, but for the rest of the world it means libraries can't embed the generated code that they rely on (if the spec is shared), which means they can't customize it (no perf/size/etc tradeoff possible), can't depend on different versions of codegen or .proto files (despite code clearly needing specific versions, and breaking changes to the generated code are somewhat common), can't have convenience plugins for things that would benefit from it, etc.
And all of this to support... an almost-completely-unused text protocol. And `google.protobuf.Any` auto-return-value-typing, but tbh I think that's simply a bad feature, and it would be better modeled as a per Any deserialize call registry, where you can do whatever the heck you like (or not use it, and just `.UnmarshalTo(&out)` with the correct type).
---
What really gets my goat here is that none of this makes sense at all for a protocol. The whole point of having a language-and-implementation-agnostic binary protocol is to not be dependent on specific codegen / languages / etc, but per above the whole Go protobuf ecosystem is rigidly locked in at all times, and nearly every change is required to be a breaking change... and if you make that breaking change in a new Go module version, like you should, you immediately break anyone who uses two of them at once, so it must also always be a semver-violating breaking change.
- ljm a year ago
  
  I used protobuf extensively with Kafka and I remember having to be quite particular about how the proto files/packages were arranged so as to avoid naming and versioning conflicts.
  We never generated go code from it, but it took a bit of fine tuning to get generated code that felt at least somewhat ergonomic for Ruby and Typescript. It usually involved using some language specific alternative to protoc for that language because the code generated by protoc itself was practically unreadable. IIRC in the case of Typescript I had to write a script that messed around with the directory structure so you could use sensible import paths and aliases, because TS itself wasn't discovering them automatically without it.
  That's stuff you can work with and solve technically. Initial faff but it's one and done. The worst problem I had with it was protobuf3 stating every field is optional by default, and the company I worked at basically developed a custom schema registry setup that declared every field as required by default, with a custom type to mark a field as optional. It turned literally every modification to a protobuf definition into a breaking change and, what's worse, it wasn't done end to end and the failures were silent, so you'd end up with missing data everywhere without knowing it for weeks.
  - Groxx a year ago
    
    >the failures were silent, so you'd end up with missing data everywhere without knowing it for weeks.
    This is the main reason I think protobuf's "zero values are simply not communicated" is fundamentally wrong. Missing data is one of the easiest flaws to miss, and it tends to cause problems far away from the source of the flaw, in both time and space, which makes it extremely hard to notice and fix.
    I get the arguments in its favor. I get the arguments in favor of "everything is optional by default". But presence is utterly critical in detecting flaws like this, and it can't always be addressed in a backwards-compatible way by application code. E.g. in proto's case, it's not possible because that data does not exist, and adding it would change the binary data. Even binary-compatible workarounds like "add a field with a presence fieldset" aren't usable because that unrecognized field will be silently ignored by older consumers, so you're right back to where you started.
    It needs to exist from day 1 or you're shooting your users in the feet.
- kostyay a year ago
  
  > E.g. there's a type registry, which means you can't ever have the same proto type compiled by two different configs (it'll panic at import time). In a monorepo that's (potentially) fine, but for the rest of the world it means libraries can't embed the generated code that they rely on (if the spec is shared), which means they can't customize it (no perf/size/etc tradeoff possible), can't depend on different versions of codegen or .proto files (despite code clearly needing specific versions, and breaking changes to the generated code are somewhat common), can't have convenience plugins for things that would benefit from it, etc.
  I actually forgot about this when writing the article. This is a major pain in the ass both in Go and Python and basically forces you to ensure than no 2 services have the same file called "api/users/service.proto". There have been multiple instances where we literally had to rename a proto file to something like reponame_service.proto to avoid this limitation.
kostyay a year ago

Twirp looks cool. Is it still being maintained? Would you choose it over connectrpc?

andy_ppp a year ago

gRPC is an absolute pain the backside. You’re not Google, use the simple thing until you need the binary format that has bad semantics for versioning schemas.

talkingtab a year ago

This was never worth it. From the beginning. Large corporations like Google try to lock developers into technology. They try to promote themselves as cool tech to job applicants. For example .NET from Microsoft, Swift from Apple, on and on. There are a few examples of good stuff coming out, but in general just say no...

And certainly if you want to get a job at one of these places, learn their technology, but then once even that company stops using it, where are you?

skybrian a year ago

It locks you into a well-documented protocol, implemented by open source software?
- jauntywundrkind a year ago
  
  We should remain forever free, no chains on us, reinventing our own ad-hoc means of communication internally at each company! /s
johannes1234321 a year ago

One may argue about the reasons for (publishing) gRPC, but for .NET and swift the purpose was clear: They want to provide attractive options for developing applications on their respective platforms. If they succeeded is somewhat subjective. But I don't see what's bad about it, for the people who like their platforms.
pathartl a year ago

So we should just never use some framework?

tomp a year ago

If you're picking communication protocols for your API endpoints, you're doing it wrong.

The ideal API supports `/api/call.json` and `/api/call.csv` and `/api/call.arrow` as well as `/api/call.grpc`, and the only thing that differs between these is the serializer (which is standardized and very well tested).

If the App is buggy, I want to (1) check what API calls it's making (2) be able to run those API calls myself, which is why I (ideally) need a text-based (human-readable) format.

ljm a year ago

I need to do some realtime stuff which involves some kind of realtime updates of some sort, I guess I can call `/api/call.sock` and it'll just work?
Oh, I'm also supporting streaming. Will `/api/call.rtmp` work for me?
What's worse, I even allow voice chat and video calls! I guess I can just do `/api/call.webrtc`?
I also forget that I serve a lot of P2P traffic, so I guess I can just add `api/call.udp` too.
The problem I have now is, how do I serve this over HTTPS? I can't pick the protocol because that'd be doing it wrong. Do I serve extra APIs like `call.rtmps`, `call.webrtcs`, `call.udps`?
I haven't even told you how fucking awkward it is to constantly enter my login details to work with my git repository and `github.com/my/repo.ssh` just gives me a 404.
APIs are a lot more than just content negotiation; the protocol is a necessary part of it and it just so happens with the web (not the internet) that the protocol is HTTP(S) over TCP and advances in technology have continued to stress its utility.

Settings

gRPC: 5 Years Later, Is It Still Worth It?

Keyboard Shortcuts