I often see the same poor advice about performance tuning, so I want to take a moment to share what I believe is more effective guidance. As always this is an approximately correct analysis because more correct takes too many words.
So, let’s be direct: most of the time, focusing on latency isn’t very helpful — by itself latency is data that only leads to frustration. Even if latency is a major problem, such as when you have orchestration issues, simply looking at the timing often doesn’t provide actionable insights [1]. For example, if response times jump from 5 seconds to 15 seconds, that information alone doesn’t reveal the root cause. I like to say, “this is data I can only cry about”. Instead, you need to understand what’s driving those times. The answer is basically always some form of resource consumption — whether it’s CPU, memory, disk, database, network, or something else. The key is to examine what you’re consuming and how you’re consuming it. That’s what will tell you what you need to change and where your biggest improvements will come from. Ultimately, it’s all about the resources.
Importance of Understanding Resource Consumption
A classic mistake is to stop analyzing resource consumption as soon as you draw a conclusion like, “Oh, I’m disk bound.” While it’s certainly possible that disk is your critical resource, identifying that doesn’t mean you can ignore other types of consumption. It just means it is even more important to understand that disk consumption thoroughly. You should know which files are being read, in what order, and whether those reads are necessary. In short, generally, you need a complete understanding of how your critical resource is consumed, because it is, after all, critical.
Perhaps most importantly, other resources in your system can still be worth optimizing. Suppose your system’s front end is compute-bound, but the main source of latency is a back-end database. You might think you can ignore the front end because it’s already fast enough, and you’re waiting on the database anyway.
That’s a flawed approach.
In today’s cloud environments, you might be running 50, 500, or even 5,000 front-end servers. If you can achieve a 50% compute savings on the front end, you can halve the number of servers from 5,000 to 2,500. That’s a significant cost saving, even if it doesn’t reduce latency for your users. And you should pursue these savings, even if your customers never notice the difference.
Resource Analysis and Marginal Reduction
The logic above holds whether you’re reducing from two servers to one, two threads to one, or cutting CPU usage by 25%, 10%, or even 5%. The potential cost savings can be substantial. The same is true on personal devices; if your laptop workload is network-bound because reasons, but you have an opportunity to reduce CPU usage by 15% or 20%, that’s still worth pursuing. Why? Because you rarely have exclusive access to all CPU resources — other users or processes could benefit from any reduction.
Even if you “own the whole machine”, you likely have other tasks you want to run. Lower CPU usage gives you headroom for improvements that users will appreciate, such as a better UI or more responsive features. Saving CPU also means you’re using less power, allowing the CPU to idle more, which is particularly beneficial on devices with many cores. It’s frustrating when a process needlessly consumes excessive resources, preventing users from getting other meaningful work done.
Time is a Consequence not a Cause
Whether it’s about saving battery, lowering heat dissipation, or just creating room for other applications, there are many reasons why reducing resource consumption makes sense. In nearly every scenario, it’s a matter of balancing the effort required with potential savings.
It’s important to remember — and this isn’t new advice — latency should be considered a secondary metric. The primary metrics are all about resource consumption. Time is a consequence, not a cause, and that makes it one of the hardest things to understand and predict. It’s always better to relate time back to some form of resource usage.
Software Resources
Adding more force to all of this, remember that any time you wrap a critical section around some computation, you’re effectively creating a software resource. It has a queue length, an average service time, and so forth. It can be measured just like a disk or other physical resource.
Thinking of your system as a network of queues, and analyzing the work performed in each, is extremely helpful. Reducing the average service time in each of these areas is a reliable strategy for improvement. Your job is to determine what service time looks like and why it might be high — perhaps due to inefficient data access patterns or poor asynchronous ordering. The queue network model also helps to reveal orchestration possibilities, like pipelining and parallelism.
By understanding and addressing these issues, you’ll make meaningful progress. Whenever you can reduce consumption, it’s a win.
Value of Frugality and Broad Optimization
In short, don’t disregard resource savings just because they don’t relate to the current critical bottleneck. Often, there are additional benefits to reducing consumption. Being frugal with resources is almost always a universally good policy. Latency isn’t everything.
[1]: A lot of times people have success with fancy orchestration diagrams that subdivide and nest the regions clearly. This basically works because if you subdivide enough you can reasonably know what code is in that small section and what it does by inspection, leading you to the consumption or queuing problem.