Latency Sneaks Up on You

brooker.co.za

135 points by luord 4 years ago · 26 comments

Reader

Great article, and a line of reasoning that ought to be more widely known. There is a similar tradeoff between latency and utilization in hash tables, for essentially the same reason.

The phenomenon described by the author can lead to interesting social dynamics over time. The initial designer of a system understands the latency/utilization tradeoff and dimensions the system to be underutilized so as to meet latency goals. Then the system is launched and successful, so people start questioning the low utilization, and apply pressure to increase utilization in order to reduce costs. Invariably latency goes up, customers complain. Customers escalate, and projects are started to reduce latency. People screw around at the margin changing number of threads etc, but the fundamental tradeoff cannot be avoided. Nobody is happy in the end. (Been through this cycle a few times already.)

dharmab 4 years ago

In my org, we define latency targets early (based on our user's needs where possible) and then our goal is to maximize utilization within those constraints.

wpietri 4 years ago

Mostly agreed, and I think the point about efficiency working against latency is both important and widely ignored. And not just in software, but software process.

There's a great book called Principles of Product Development Flow. It carefully looks at the systems behind how things get built. Key to any good feedback loop is low latency. So if we want our software to get better for users over time, low latencies from idea to release are vital. But most software processes are tuned for keeping developers 100% busy (or more!), which drastically increases system latency. That latency means we get a gain in efficiency (as measured by how busy developers are) but a loss in how effective the system is (as determined by creation of user and business value).

azundo 4 years ago

This principle applies as much to the work we schedule for ourselves (or our teams) as it does to our servers.

As teams get pushed to efficiently utilize scarce and expensive developer resources to their max they can also end up with huge latency issues for unanticipated requests. Not always easy to justify why planned work is way under a team's capacity though even if it leads to better overall outcomes at the end of the day.

mjb 4 years ago

Yes, for sure.
As another abstract example that's completely disconnected from the real world: if we're running the world's capacity to make n95 masks at high utilization, it may take a while to be able to handle a sudden spike in demand.

jrochkind1 4 years ago

Mostly a reminder/clarification of things I knew, but a good and welcome one well-stated, because I probably sometimes forget. (I don't do performance work a lot).

But this:

> If you must use latency to measure efficiency, use mean (avg) latency. Yes, average latency

Not sure if I ever thought about it before, but after following the link[1] where OP talks more about it, they've convinced me. Definitely want mean latency at least in addition to median, not median alone.

[1]: https://brooker.co.za/blog/2017/12/28/mean.html

marcosdumay 4 years ago

> at least in addition to median
There was an interesting article here not long ago that made the point that median is basically useless. If you load 5 resources on a page load, the odds of all of them being faster than the median (so it represents the user experience) is about 3%. You need a very high rank to get any useful information, probably with a number of 9s.
- jrochkind1 4 years ago
  
  Median for a particular action/page might be more useful.
  - marcosdumay 4 years ago
    
    No doubt about that (even then, you will probably want the 90 or 99 percentile, depending on how many interactions you expect a person to have).
    Th real median is just very hard to measure, and an easier 99.99 (with more 9's as needed) rank is almost as good.
    
    jrochkind1 4 years ago
    
    Can you say more about why you say the "real median" is hard to measure? It doesn't seem hard to measure to me, or any harder than a 99.99 percentile. Why is 50th percentile harder to measure than 99.99th?
zekrioca 4 years ago

> If we're expecting 10 requests per second at peak this holiday season, we're good.
Problem is, sometimes system engineers do not know what to expect, but they still need to have a plan for this case.

dwohnitmok 4 years ago

I think the article is missing one big reason why we care about 99.99% or 99.9% latency metrics and that is that we can have high latency spikes even with low utilization.

The majority of computer systems do not deal with high utilization. As has been pointed out many times, computers are really fast these days, and many businesses may be able to get away through their entire lifetime on a single machine if the underlying software makes efficient use of the hardware resources. And yet even with low utilization, we still have occasional high latency that still occurs often enough to frustrate a user. Why is that? Because a lot of software these days is based on a design that intersperses low-latency operations with occasional high-latency ones. This shows up everywhere: garbage collection, disk and memory fragmentation, growable arrays, eventual consistency, soft deletions followed by actual hard deletions, etc.

What this article is advocating for is essentially an amortized analysis of throughput and latency, in which case you do have a nice and steady relationship between utilization and latency. But in a system which may never come close to full utilization of its underlying hardware resources (which is a large fraction of software running on modern hardware), this amortized analysis is not very valuable because even with very low utilization we can still have very different latency distributions due to the aforementioned software design and what tweaks you make to that.

This is why many software systems don't care about the median latency or the average latency, but care about the 99 or 99.9 percentile latency: there is a utilization-independent component to the statistical distribution of your latency over time and for those many software systems which have low utilization of hardware resources that is the main determinant of your overall latency profile, not utilization.

MatteoFrigo 4 years ago

Even worse, the effects that you mention (garbage collection, etc.) are morally equivalent to an increase in utilization, which pushes you towards the latency singularity that the article is talking about.
As an oversimplified example, suppose that your system is 10% utilized and that $BAD_THING (gc, or whatever) happens that effectively slows down the system by a factor of 10 at least temporarily. Your latency does not go up by 10x---it grows unbounded because now your effective utilization is 100%.

shitlord 4 years ago

If you're interested, there's a whole branch of mathematics that models these sorts of phenomena: https://en.wikipedia.org/wiki/Queueing_theory

ksec 4 years ago

OK. I am stupid. I dont understand the article.

>> If you must use latency to measure efficiency, use mean (avg) latency. Yes, average latency

What is wrong with measuring latency at 99.99 percentile with a clear guideline that optimising efficiency ( in this article higher utilisation ) should not have trade off on latency?

Because latency is part of user experience. And UX comes first before anything else.

Or does it imply that there are lot of people who dont know the trade off between latency and utilisation? Because I dont know anyone who has utilisation to 1 or even 0.5 in production.

mjb 4 years ago

The other two answers you got are good. I will say that monitoring p99 (or 99.9 or whatever) is a good thing, especially if you're building human-interactive stuff. Here's my colleague Andrew Certain talking about how Amazon came to that conclusion: https://youtu.be/sKRdemSirDM?t=180
But p99 is just one summary statistic. Most importantly, it's a robust statistic that rejects outliers. That's a very good thing in some cases. It's also a very bad thing if you care about throughput, because throughput is proportional to 1/latency, and if you reject the outliers then you'll overestimate throughput substantially.
p99 is one tool. A great and useful one, but not for every purpose.
> Because I dont know anyone who has utilisation to 1 or even 0.5 in production.
Many real systems like to run much hotter than that. High utilization reduces costs, and reduces carbon footprint. Just running at low utilization is a reasonable solution for a lot of people in a lot of cases, but as margins get tighter and businesses get bigger, pushing on utilization can be really worthwhile.
- bostik 4 years ago
  In my previous job me and latency-sensitive engineering teams in general mostly went with just four core latency measurements.[ß]
  - p50, to see the baseline - p95, to see the most common latency peaks - p99, to see what the "normal" waiting times under load were - max, because that's what the most unfortunate customers experienced
  In a normal distributed system the spread between p99 and max can be enormous, but the mental mindset of ensuring smooth customer experience, with awareness that a real person had to wait that long, is exceptionally useful. You need just one slightly slower service for the worst-case latency to skyrocket. In particular, GraphQL is exceptionally bad at this without real discipline - the minimum request latency is dictated by the SLOWEST downstream service.
  To be fair, it was a real time gambling operation. And we were operating within the first Nielsen threshold.
  ß: bucketing these by request route was quite useful.
  EDIT: formatting
srg0 4 years ago

Percentiles are order statistics, they are robust and not sensitive to outliers. This is why sometimes they are very useful. And this is why they do not capture how big the remaining 0.01% of the data are.
Let's take a median, which is also an order statistics. And a sequence of latency measurements: 0.005 s, 0.010 s, 3600 s. Median latency is 0.010 s, and this number does not tell how bad latency can actually be. Mean latency is 1200.05 s, which is more indicative how bad the worst case is.
In other words, percentiles show how often a problem happens (does not happen). Mean values show the impact of the problem.
dastbe 4 years ago

every statistic is a summary and a lie. p50/p99 metrics are good in the sense that they tell you a number someone actually experienced and they put an upper bound on that experience. they are bad because they won’t tell you is how the experience below that bound looks.
mean and all it’s variants won’t show you a number that someone in your system necessarily experienced, but it will incorporate the entire distribution and show when it has changed.
in the context of efficiency, mean is beneficial because it can be used to measure concurrency in a system via littles law, and will signal changes in your concurrency that a percentile metric won’t necessarily do.

azepoi 4 years ago

"How not to measure latency" video by Gil Tene

https://youtu.be/lJ8ydIuPFeU

dvh 4 years ago

Grace Hopper explaining 1 nanosecond: https://youtu.be/9eyFDBPk4Yw

the_sleaze9 4 years ago

Seems like a trivially simple article, and I remain unconvinced of the conclusion. I think this is a confident beginner giving holistic, overly prescriptive advice. That is to say: feel free to skip and ignore.

In my experience, if you want monitoring (or measuring for performance) to provide any value what so ever, you must measure multiple different aspects of the system all at once. Percentiles, averages, load, responses, i/o, memory, etc etc.

The only time you would need a single metric would possibly be for alerting, and a good alert (IMHO) is one that triggers for impending doom, which the article states percentiles are good for. But I think alerts are outside of the scope of this article.

TLDR; Review of the article: `Duh`

MatteoFrigo 4 years ago

You characterization of Marc Brooker as a "confident beginner" is incorrect. The guy is a senior principal engineer at AWS, was the leader of EBS when I interacted with him, and has built more systems than I care to mention. The phenomenon he is describing is totally real. Of course the article is a simplification that attempts to isolate the essence of a terrifyingly complex problem.
- LambdaComplex 4 years ago
  
  And it even says as much in the sidebar on that page:
  > I'm currently an engineer at Amazon Web Services (AWS) in Seattle, where I lead engineering on AWS Lambda and our other serverless products. Before that, I worked on EC2 and EBS.
christogreeff 4 years ago

https://brooker.co.za/blog/publications.html

Settings

Latency Sneaks Up on You

Keyboard Shortcuts