Designing Network Protocols
journal.paul.querna.orgAs someone who troubleshoots networks for living, it can't be overstated what value easily understood information has. Normal factors such as lack of understanding of a system, miscommunication, false assumptions, false information, all types of bugs, and operator error already plague troubleshooting. Anything you can do to simplify is extremely important. Little things like being able to read debugs directly out of wireshark can make the difference between solving something between minutes and hours, days and weeks.
Unfortunately serviceability is not normally high in the initial product requirements, but I've seen a direct correlation between customer satisfaction in support and products/protocols/designs with good serviceability.
I think the short version is: in the past 5 years, simple wire-oriented serialisation formats have become much more common. At the time you had your pick of ASN (humongous) or Thrift (brand new).
Not particularly fair. You also had XDR (ubiquitous on Unix systems), IIOP, TLV, ICE... not to mention what ever protocol designer for the past 20 years has used: network byte order integers and ASCII/UTF8 strings.
Some people just like ASCII, human readable protocols. There's nothing wrong with that, but it's a little silly to suggest that the options for a packed binary encoding in 2007 were limited because Thrift and Protocol Buffers were too new.
You're right, to my shame. I should've thought of XDR myself, after suffering through a lecture on it at uni.
I'd be interested in the original designer's remarks on using existing wire serialisation formats.
I mentioned ASN.1 DER, but I honestly didn't think I should go into a history of XDR or other encodings. I guess I can't skip any history in a blog post....
I was responding to a comment, not your post. I don't think you really need to justify using an ASCII protocol (though, again, I think HTTP query arguments are a poor choice).
I don't believe a tl;dr is necessary for this short article.
I enjoyed reading through his thought process for designing a simple protocol which is:
* Easy to use (requiring little/no additional libraries) * Easy to extend (simple keyword/value extensions) * Immune to changes in technology * (above all) easy to understandYet he himself says he considered and rejected a binary protocol for the reason I gave: "I considered using a binary format, but the immediate problem was having extendable fields.", going on to point out that he rejected Thrift because it was too new and ASN.DER because it was too big.
That said, I think he didn't want a binary format in any case -- his "doing it again today" remarks point to JSON.
Actually, the short version is: I wanted it easily debuggable by a network admin. (It's a point he made repeatedly). All the other arguments were moot.
So the network protocol must be in text so that a network admin could debug it? This is absurd.
How does a network admin debug a binary protocol for which no dissector has been implemented/merged into core for Wireshark, and no decoder has been written for tcpdump?
It's obviously doable, but it's very painful.
Isn't Wireshark extensible in Lua?
I can see both sides of the argument here, but basing a protocol on text just for the ease of eyeballing it on-the-wire seems like optimizing for the uncommon case.
Heck, almost any decent protocol should only have ciphertext on-the-wire anyway.
That's more or less like saying "well they can just write the decode". They're network administrators. If you use an ASCII protocol, they don't have to do anything.
I'm saying someone can write the decode and share it on their blog post or Github and your admin can start using it without having to recompile Wireshark. (I think, haven't actually tried it myself).
But even still, this only matters if:
A. The protocol is so new that Wireshark isn't shipping a parser,
B. the admin's stuff isn't working,
C. the admin can't get his stuff working by normal troubleshooting and must resort to observing the protocol,
D. the admin can't get his stuff working by observing the binary representation of the protocol, and
E. the admin actually can get his stuff working with a transliterated ASCII representation of the protocol.
Certainly I would probably find it easier to troubleshoot a text-based protocol too. I just think it's a relatively minor case in the grand scheme of things.
How does a sysadmin debug a binary application for which he doesn't have any symbols?
On the other hand, are Wireshark and tcpdump now the gatekeepers for new protocols?
What's your point? I'm not making a value judgement.
You say that to me a lot.
My point is that I imagine a network designer shouldn't focus on Wireshark or tcpdump integration over other non-functional requirements such as, well, network performance.
Network performance isn't as visible as the non-functional requirement of inspectability because it is amortised over potentially millions of machines, whereas inspectability is an immediately visible issue to the select few who "pop the hood" to fix an issue or simply to have a look.
For example: in terms of network capacity, I wonder how much HTTP headers cost all of us collectively. Probably a lot more than the cost of making a Wireshark plugin and having sysadmins install it as necessary.
Edit: put another way, I think designers should prioritise the needs of the people who pay the cost of network operation over the convenience of the operators.
There's a feedback loop here -- if it's too hard and thus very expensive to operate a system, then optimising for performance was a false win. But I don't think this is such a case, especially since as you pointed out elsewhere there are a number of very mature binary wire formats that were extant in 2007.
See: http://cr.yp.to/sarcasm/modest-proposal.txt
"I implore [you] to remember Dave and Virginia, preying on the drug addicts of the next generation and the sexually dissatisfied men of the previous generation. How different their careers could have been if their parents had not downloaded so many terabytes of data! We must not abandon our children to such a fate."
Not as absurd as you'd think. You don't want to debug the protocol itself, but you want to be able to easily read what messages were exchanged.
I get the rationale. But I think it's weak, and this entire post is lots of fluff around that core rationale. (I've been writing extensible binary protocols back in 1988 - and it never struck me as particularly difficult even back then.)
There's a complete chapter in The Art of Unx Programming about the importance of being textual:
http://catb.org/~esr/writings/taoup/html/textualitychapter.htmlHTTP-style query strings are a horrible format, whether you like ASCII or not.
I would accept the ease of debugging argument if these messages weren't so small and so common. 1000 servers constructing strings and sending them over the network once a second is a nontrivial waste of resources.