How Not to Measure Computer System Performance

homes.cs.washington.edu

97 points by sidereal 12 years ago · 13 comments

Reader

ltratt 12 years ago

Benchmarking practises are currently poor, almost without exception. For peak VM performance, we have started to use Kalibera/Jones's method http://kar.kent.ac.uk/33611/7/paper.pdf (we reimplemented the statistical computations at http://soft-dev.org/src/libkalibera/ to make it more accessible). I don't think this method is the end of the story, but it's a definite improvement: we were surprised at some of the odd effects it highlighted (non-determinism was not what I was expecting). It's definitely changed how I think about benchmarking.

oneofthose 12 years ago

Great article the gist I get from it is: running an experiment in computer science is easy (just ./bench), running an experiment in computer science in a correct way is hard. I agree with this assessment.

This plotty tool [0] seems interesting and valuable - but I'm not sure how it relates to the problem the author talks about.

[0] https://github.com/jamesbornholt/plotty

mstromb 12 years ago

Why would linking order affect runtime performance? Something to do with the interaction between offsets and cache, maybe?

Would it be possible to determine ahead of time what order would maximize performance, or would that require profiling?

rectang 12 years ago

I'd speculate that if you're unlucky about link order, two hot cache lines may get mapped to the same slot in an N-way associative cache -- whereas if you're lucky, they end up going to different slots and don't continuously evict each other.
With regards to alignment... do linkers typically pack objects so tightly that the start of each object isn't aligned on a cache line boundary? AFAIK they're typically 32, 64, or 128 bytes.
- fleitz 12 years ago
  
  Cacheline boundary?
  Probably, because caches line sizes are an implementation detail, not part of the architectural specification.
lgeek 12 years ago

Linking order will affect code and static data order and cache alignment in turn. Similarly, environment variables are pushed on the stack by the kernel. Their size will affect alignment of application data on the stack.
Maybe there are other causes as well.
> Would it be possible to determine ahead of time what order would maximize performance, or would that require profiling?
I think at the very least, you'd need profiling to determine the hot code path, and that can change depending on input...
halayli 12 years ago

Guessing, locality of reference might play a role.
- fleitz 12 years ago
  
  That, the branch predictors, and caching behaviors, n-way, alignment, etc.
  It probably explains why different runs vary so widely, I always thought it was other things going on in the OS, never really thought about the caches, etc.
  - halayli 12 years ago
    
    > That, the branch predictors, and caching behaviors, n-way, alignment, etc.
    Those all fall under locality of reference, btw. But yeah cache and branch prediction play a huge role in the list.
    One thing that stung me in the past was OS scheduling.

peterwwillis 12 years ago

There are lies, damned lies, and software benchmarks.

jacques_chester 12 years ago

Lies, damn lies, and $100 million investments.

a3089268 12 years ago

I think the author has computer science and computer engineering confused.

scott_s 12 years ago

I don't think he does. What he describes is part of what myself and my colleagues consider "computer science." We typically consider "computer engineering" the design and making of hardware. But to be a systems researcher in computer science, you must know how these things work, and be able to reason about how they affect the software systems you care about.

Settings

How Not to Measure Computer System Performance

Keyboard Shortcuts