HTTP/1.1 and HTTP/2: A Performance Comparison for Python¶
If you don't pay any attention to my Twitter feed, you might have missed the fact that I have spent the last few months working on a client-side HTTP/2 stack for Python, called hyper. This project has been a lot of fun, and a gigantic amount of work, but has finally begun to reach a stage where some of the more crass bugs have been worked out.
For this reason, I think it's time to begin analysing the relative performance of HTTP/1.1 and HTTP/2 in some example use-cases, to get an idea of where things stand.
Like any good scientist, I don't want to just dive in and explore: I first want to establish what I expect to see. These expectations come from two places: familiarity with hyper, and familiarity with HTTP in general.
My expectation is that hyper is, in its current form, going to compare to the standard Python HTTP stack as follows:
hyperwill be more CPU intensivehyperwill be slowerhyperwill increase the amount of data sent on the network for workloads involving a small number of HTTP requestshyperwill decrease the amount of data sent on the network for workloads involving a large number of HTTP requests
This is for the following reasons. Firstly, hyper will consume more CPU because it has substantially more work to do than a standard HTTP stack. hyper needs to process each HTTP/2 frame (of which there will be at least 4 per request-response cycle), burning CPU all the while to do so. Conversely, the standard HTTP/1.1 stack in Python can do relatively little work, reading headers line-by-line and then the body in one go, requiring almost no transformation between wire format and in-memory representation.
Secondly, hyper will be slower because it has to cross from user-space to kernel-space and back again twice per frame read. This is because hyper needs to read 8 bytes from the wire (to find out the frame length), followed by the data for the frame itself. This context-switching is expensive, and not something that needs to be done in quite the same way for HTTP.
For workloads involving a small number of requests, HTTP/2 does not provide particular bandwidth savings or improve network efficiency. The bandwidth savings provided by HTTP/2 come from header compression, which is at its most effective when sending and receiving multiple requests/responses with very similar headers. For small numbers of requests, this provides little saving. The network efficiency savings come from having long-lived TCP connections resize their connection window appropriately, but this benefit will be lost when sending relatively small numbers of requests. As the cherry on top of this cake, there's some additional HTTP/2 overhead in the form of framing and window management which will lead to HTTP/2 needing to send more bytes than HTTP/1.1 did.
HTTP/2's major win should be in the area of workloads with large numbers of requests. Here, HTTP/2's header compression and long-lived connections should be expected to provide savings in network usage.
These are my expectations. Let's dive in and see what we can see.
The Set Up¶
First, I need to install hyper. Because of some ongoing issues regarding upstream dependencies I will be running this test in Python 3.4 using the h2-10 branch of hyper (which, despite its name, implements the h2-12 implementation draft of HTTP/2). As such, I went away and installed that branch using pip.
Let's confirm that hyper is installed and functioning by importing it and sending a test query to Twitter, who have a HTTP/2 implementation running on their servers.
Now, there's a big caveat with the above that I failed to mention. By default, http.client does not allow for gzip-compressed content, while HTTP/2 mandates it. I left this asymmetry for the sake of example: after all, it does mean that a bare minimum HTTP/2 implementation is strictly more efficient than a bare-minimum HTTP/1.1 implementation. For reasons that are opaque to me nghttp2.org doesn't return gzip in HTTP/1.1, even with the appropriate Accept-Encoding header set. However, this 50% performance improvement on a standard HTML website is not to be expected across the board, as most websites will allow compressed data access. At the moment, HTTP/2 is not widely-enough deployed to write a scraper that comprehensively demonstrates the improvement of HTTP/2 over HTTP/1.1 in a truly fair test.
Let's consider another point: CPU usage.
I expect that hyper will be substantially more CPU intensive that a standard HTTP/1.1 client stack. I've outlined some reasons above, so I won't rehash them. This is hard to test in Python from the shell itself, and warrants a longer discussion.
Note that exactly how this affects CPU usage is hard to gauge, and varies from workload to workload. http.client, for example, reads HTTP headers line-by-line, by calling readline() repeatedly. This actually means that http.client has a tendency to context-switch a lot: in header-heavy body-light workloads, it'll probably do so more than hyper does.
Summary¶
We can see that in the basic case http.client has the edge on hyper, but that for certain kinds of workloads hyper is likely to be substantially better. In particular, repeated access to the same site is a lot easier and also faster, employing header compression and the request pipelining powers of HTTP/2 to achieve substantial speedups, even at the cost of increased complexity in the protocol stack itself.
A More Realistic Comparison: Requests¶
Let's do a comparison that is more likely to match the current HTTP use-cases of most Python developers. To do so, we'll take advantage of everyone's favourite HTTP library, Requests. hyper contains a Requests Transport Adapter, which means that you can use HTTP/2 with Requests already. This is likely to be a test that shows HTTP/1.1 in a better light, thanks to Requests using connection pooling and body compression, and because it prevents hyper from pipelining requests.
Let's do a similar task, web scraping, but now using requests and Twitter. Let's whip up some code.
This is revealing. In this example, everything changes, and HTTP/2 is the loser. Why is that?
Well, let's consider the differences. First, Twitter does compress their response bodies over HTTP/1.1. This eliminates one of HTTP/2's main advantages in the previous test. Next, this test is strictly serial: we can't be uploading requests and downloading responses at the same time because Requests simply is not architected for it. This costs HTTP/2 its advantage of more efficient use of a TCP connection. As an additional bit of fun, the above example only uses a single TCP connection per function in the HTTP/1.1 case thanks to Requests' connection pooling. This means that HTTP/2 doesn't gain the advantage of opening fewer TCP conections.
However, all of the overhead involved in making HTTP/2 requests continues to remain. Large response bodies incur a fairly substantial reading overhead in HTTP/2 due to the framing: even using just four HTTP/2 DATA frames to send a response body will cause hyper to need to make eight socket.read() calls just to pull the data off the wire. Additionally, hyper will need to maintain two flow-control windows per request, and will occasionally need to stop to send a flow-control frame to let Twitter send more data, further adding to the socket-based overhead. As a fun point on top, it's quite possible that, in HTTP/2, hyper will end up downloading more data than in the HTTP/1.1 case depending on how well Twitter handle the per-DATA-frame padding that HTTP/2 allows.
Summary¶
This has been a fairly shallow dive into the ways HTTP/1.1 and HTTP/2 compare, considering a couple of example use-cases and comparing their outputs. What can we conclude?
The short answer, at least for me, is that HTTP/2 is underwhelming. For effectively-serial clients like Requests doing web-scraping (or any form of work where the response body is the major component of bandwidth use), HTTP/2 is a bust. The overhead in terms of complexity and network usage is massive, and any gains in efficiency are eliminated if HTTP/1.1 is deployed in any sensible way (allowing gzip and connection reuse). For clients that are more parallel, HTTP/2 has the potential to have some advantages: it limits the number of sockets you need to create, it more efficiently uses TCP connections and it avoids the need for complex connection-pooling systems. However, it does so at the price of tremendous complexity. The computational workload is substantial compared to HTTP/1.1, and ends up providing relatively limited benefits to the client.
Who're the big winners from HTTP/2, then? Two answers: browsers and servers. For servers, they have to handle fewer concurrent connections (so tying up fewer system resources) and can more effectively distribute resources to clients (thanks to server push). For browsers, they can avoid the current limit on the number of concurrent connections per host, while taking advantage of complex flow-control and prioritisation schemes to maximise the efficiency of their bandwidth usage. This is difficult for a generic non-browser client to do in any intelligent way without pushing the burden of those decisions onto their user, and even if it worked, most non-browser clients don't have these specific problems.
This should not come as a surprise. The big stakeholders in HTTP/2 are Google (browser and server provider), Mozilla (browser provider mostly), Microsoft (browsers and servers) and Akamai (servers, kinda). Those are the hostnames that seem to come up most when I do a quick search of the mailing list archives. Unsurprisingly, these stakeholders have focused on their most common use-cases, and have come up with a protocol that suits their needs very well. Sadly, those decisions don't necessarily translate into big wins for those of us that are focused on non-browser client-side interactions.
Don't get me wrong, it's not all gloomy. In some use-cases (ones where headers dominate the request/response sizes) HTTP/2 is a big win for non-browser clients. Additionally, HTTP/2 bundles in some awesome mandatory support for TLS (things like requiring TLSv1.2, for example), ensuring that most well-deployed HTTP/2 services will be very secure indeed. These are good things, and their inclusion should not be overlooked.
With all that said, I encourage cautious optimism regarding HTTP/2. I don't believe that HTTP/2 will replace HTTP/1.1 in all cases, or even necessarily in a majority. Mostly, the HTTP Working Group hold the same viewpoint, though curiously some people disagree, a position that both I and Poul-Henning Kamp find a bit weird.
Nevertheless, keep an eye on it. If you think it's an interesting problem, I'd love more contributors to hyper. We've got a set of contributors guidelines: please read them and then dive in. If you just want to keep reading about HTTP/2, I'll be writing about it from time-to-time on my blog, so keep an eye on that if you're interested in more.
-- Cory
(Feel free to follow me on Twitter, or @message me if you want to chat more about HTTP/2. If you want to chat privately, you can email me at cory@lukasa.co.uk: if you're the kind that likes encryption, my GPG key is here.)