Beyond malloc efficiency to fleet efficiency
cloud.google.com> As an example of the benefits of this approach, one service increased its time in TCMalloc from 2.7% to 3.5%, an apparent regression, but reaped improvements of 3.4% more requests-per-second, a 1.7% latency reduction, and a 6.5% reduction in peak memory usage!
This is the stuff of performance nightmares. Anyone thinking about optimization often will get single-tracked into the performance regression there and maybe not necessarily see the improved overall performance (requests per second).
Agreed. Microbenchmarking can be detrimental if you don't verify with some 'macro' benchmarking with realistic use cases.
And tracking the right metrics!
> In Google’s data centers, this improvement reduced TLB stalls by 6% and memory fragmentation by 26%.
Yet after Ctr+F-ing the paper for the term, I have yet to find an accurate definition of "fragmentation". Keeping in mind that fragmentation is an allocator's major enemy, it bugs me to realize that there is no universally agreed-upon formulation yet.
Does anyone more knowledgeable have a more informed opinion on the matter?
Seems like a bit of a tragedy of the commons. Individual containers benefit from a faster malloc, but the whole fleet benefits from one doing more work.