curl rate limits - NFHN Reader

The curl project merged #19384, #20033 and #20228 which re-implements rate limiting of transfers in libcurl. This will be part of the upcoming 8.19.0 release at the beginning of March 2026.

What is it, why did we change it and how?

Verasca Dam via swissdams.ch

libcurl’s rate limiting

libcurl offers the options CURLOPT_MAX_SEND_SPEED_LARGE and CURLOPT_MAX_RECV_SPEED_LARGE on transfers to set the limit in bytes per second for up- and downloads. This is available in the curl tool as command line option --limitrate <speed>. Running

> curl --limitrate 1m https://curl.se/download/curl-8.18.0.tar.xz

tells curl to not exceed 1 megabyte per second when downloading the curl release tarball.

This is useful when one wants to make sure the internet link in your home has left enough capacity free to do other things. The Steam application allows you to limit the rate for game updates, so you may continue playing or watching twitch during it.

Another use case is media streaming servers. When you listen to spotify or watch movies in Netflix (or other streaming services), the server your client talks to usually contacts a backend storage server to retrieve the media it needs (it’s not possible to have a complete Netflix library copy on each client-facing server).

In order to deliver the video data at, let’s say 5 Mbit/s, to the user, it makes sense to retrieve it from the storage backend at the same rate. For one, the user may abort the consumption early and all data the storage node transmitted turns into waste. Waste of energy and bandwidth. Second, it loads the backbone network more fairly. When thousands of users press “play” at the same time, it’s easier to manage the load.

design space

As with every design, the trade offs are the kicker. Designing how to handle rate limits has to respect three conflicting properties:

Precision: the transfer should not be faster than the set limit, but also not unnecessary slower.
CPU Usage: a “slowed” transfer should consume minimal cpu time.
Smoothness: the behaviour should be the same during the whole transfer and equally apply to all transfer sizes.

How are these in conflict?

The optimal precision+smoothness is achieved when sending/receiving a single byte each time in short intervals. With a rate limit of 1 MB/s that would be reading one byte per microsecond. That easily observes the limit (precision) and also makes it smooth (e.g. a 500KB download would take 0.5 seconds, as it should). But CPU load would be high.

Optimizing precision+cpu would read up to 1MB each second. That data may all arrive in the first millisecond, giving a fast connection. Unless the whole download is done, it waits 999 ms before continuing. It’s not very smooth and transfers smaller than the rate limit are not limited at all.

A simple approach to cpu+smoothness would be to read 1 byte each second. Very efficient and smooth, but the limit is not obeyed.

new implementation

The new rate limit implementation uses a token bucket at its core. The “bucket” contains initially “limit” amount of tokens. Before receiving data, a transfer checks how many tokens remain in the bucket, tries to receive at most that many and afterwards “drains” the bucket of the number of bytes it actually received.

When the bucket becomes empty, receiving is no longer allowed and the transfer needs to wait until the bucket is filled again. Since the rate limits operates per second, the wait time simply is “1s - time_to_drain”.

This gives good precision, the limit is never exceeded, and almost no cpu overhead. Smoothness suffers, though. Let’s look at the time lines for different transfer sizes for a rate limit of 1 MB/s:

Transfer Size, limit 1 MB/s
3.1 MB |**....**.....**.....*|     *: receiving
3 MB   |**....**.....**|           .: waiting
2.5 MB |**....**.....*|
2 MB   |**....**|
1 MB   |*|
time   +------+------+------+------
       0s     1s     2s     3s

Meh. Our token bucket always has full tokens for the last second which then receives that data as fast as it can. For long transfers, this does not matter much, but shorter transfers are not smooth or even are not restricted by the rate limit at all.

tweaking the bucket

When libcurl knows the total amount of bytes to be transferred (for example the server announced the Content-Length: in its response), we adjust the token bucket.

We want the last “step” to only handle a small percentage of the total transfer (1%, at most 4KB). Then we adjust the length of the steps, so that this amount remains for the last step.

Total: 3MB, limit 1 MB/s
             Initial    Adjusted
Bucket Size: 1 MB       ~1.49 MB
Bucket Step: 1000 ms    1490 ms
  Last Step: 1 MB       4 KB
   Duration: 2.1 sec    ~3 sec

3 MB   |**....**.....**|          Initial
3 MB   |***.......***.......*|    Adjusted
time   +------+------+------+------
       0s     1s     2s     3s

And for a very small transfer:

Total: 500KB, limit 1 MB/s
             Initial    Adjusted
Bucket Size: 1 MB       496 KB
Bucket Step: 1000 ms    496 ms
  Last Step: 500 KB     4 KB
   Duration: 0.1 sec    ~500 ms

0.5MB   |*|             Initial
0.5MB   |*..*|          Adjusted
time    +------+------+------+------
        0s     1s     2s     3s

When you observe closely, we sacrifice precision here for increased smoothness. We could improve this by halving the steps in such cases in the future. The first example is pretty much a worst case, though.

Multiplexing Protocols

For multiplexing protocols (who manage more than one transfer per connection, like HTTP/2 and QUIC) things are a bit more complicated.

libcurl always has to receive all incoming data on such connections. There could be one rate limited transfer and another unlimited one. There could be meta data arriving that needs attention.

To make rate limits work, libcurl has to tell the server to limit the amount of data it can send for a transfer. This is known as “stream window size” in HTTP/2 and “stream max write offsets” in QUIC.

The HTTP/2 and QUIC implementations in libcurl use the tokens available in the bucket (if rate limiting is in effect) to manage the amount the server is allowed to send. (for QUIC, this only works with ngtcp2 as the quiche library does not support the needed mechanisms in its API.)

With this, rate limiting works the same for all HTTP versions in libcurl for the upcoming 8.19.0 release.

Summary

The new rate limiting mechanism in curl 8.19.0 is available for all protocols, including HTTP/2+3 and, in our opinion, a good trade off between precision, cpu usage and smoothness. May it serve you well.

*) If you are a large streaming service and use libcurl in your clients and/or servers, you really should support the curl project. I mean, WTF?