The ‘Thundering herd’ problem

Learnings from FB case study

What is the “Thundering herd” problem?

The thundering herd problem is a type of performance issue that occurs when multiple processes or threads are waiting for a resource to become available, and once it becomes available, they all rush to acquire it at the same time. This can lead to a sudden spike in demand for the resource and cause a bottleneck, which can result in poor performance, high latency, and even system crashes.

The term “thundering herd” comes from the analogy of a large group of animals all running toward the same location at once, causing chaos and congestion.

This problem can occur in many different types of systems, such as network servers, databases, and distributed systems. For example, imagine a database that stores user account information. If many users simultaneously try to access their accounts, and the database is not designed to handle such a high level of concurrent requests, then a thundering herd problem can occur, resulting in slow response times or even system failures.

Facebook Case study

FB rolled out a Live video feature that allows verified public figures to use Mentions to broadcast live videos to their fans on Facebook.

Problem statement

Public figures can have millions of followers and all try to watch a live video at once, the system should be able to handle the load of incoming requests trying to watch the stream at the same time. These many requests coming at once is called “thundering herd” problem. Too many requests can stampede system, causing lag, connection dropout.

How to handle this problem

Several techniques can be used to handle the thundering herd problem.

1. Caching: Caching is a technique that involves storing frequently accessed data in memory or on disk so that it can be quickly retrieved without accessing the underlying resource. By caching frequently accessed data, the load on the resource can be reduced, and the likelihood of a thundering herd problem occurring can be minimized.

2. Load balancing: Load balancing involves distributing incoming requests across multiple servers to prevent any one server from becoming overwhelmed. By distributing the load across multiple servers, the likelihood of a thundering herd problem occurring can be reduced.

3. Queuing: Queuing involves placing incoming requests in a queue and processing them in a controlled and orderly manner. By queuing requests, the load on the resource can be controlled, and the likelihood of a thundering herd problem occurring can be minimized.

4. Throttling: Throttling involves limiting the rate at which requests are processed. By limiting the rate at which requests are processed, the load on the resource can be controlled, and the likelihood of a thundering herd problem occurring can be minimized.

5. Connection pooling: Connection pooling involves reusing existing connections to a resource rather than creating a new connection for each request. By reusing existing connections, the load on the resource can be reduced, and the likelihood of a thundering herd problem occurring can be minimized.

How FB solved this problem

Facebook has implemented several strategies to solve the thundering herd problem, which is a common problem for large-scale social media platforms. One such strategy is edge caching, which involves storing frequently accessed data on edge servers closer to the end user, rather than on the central server. This reduces the load on the central server and helps to prevent a sudden surge in demand.

Another strategy used by Facebook is the asynchronous loading of data. This involves loading only the necessary data first and then fetching additional data asynchronously in the background. This helps to reduce the amount of data that needs to be loaded at once and ensures that resources are used more efficiently.

Facebook also uses load balancing and auto-scaling to distribute the load across multiple servers and ensure that resources are used efficiently. Load balancing ensures that incoming requests are distributed evenly across multiple servers, while auto-scaling allows for the addition or removal of servers based on the level of demand.

Furthermore, Facebook has implemented a technique called request coalescing, which involves collapsing multiple requests for the same data into a single request. This helps to prevent a thundering herd problem from occurring by reducing the number of requests made to the server.

Understanding the flow

Press enter or click to view image in full size

Thundering herd requests flow

A video is split into multiple segments, so a user requests one segment at a time.

The best way to avoid this is to stop the requests at the initial gates. FB uses CDN(Content Distribution Network) useful for serving a large amount of static media, when a request comes it first checks the edge server for that segment, if available it will start serving else it will go to the backend server(Origin Server) and cache it locally and serve it from there for next requests, (edge servers), Akamai, Cloudflare, etc.

Edge cache server can recieve millions of requests, it first checks if the requested segment is present in the cache, if present it will directly respond from here. If not, instead of returning cache miss to all the requests, it will return the cache miss for the first request and put all other requests in the queue, and will send an HTTP request to the origin server which is also a cache server with the same architecture once the response is received all requests in the queue will be responded as cache hits. The origin cache in turn runs the same mechanism to handle requests from multiple edge caches.

If the segment is present on the origin server cache will directly get the response and the edge cache server will be updated too. If not, then the request will be sent to the live stream server and will return the response to each cache layer. So this way only a fraction of the original request will reach to live stream server and all other requests will receive the response faster.

This worked well for FB except there are still many requests which were getting passed from the edge server. To handle this FB used the request coalescing technique.

Summary

In summary, Facebook has solved the thundering herd problem by implementing strategies such as edge caching, asynchronous loading of data, load balancing, auto-scaling, and request coalescing. These strategies have helped to reduce the load on the central server, distribute the load across multiple servers, and ensure that resources are used efficiently, resulting in improved performance and a better user experience.