Settings

Theme

How Not to Measure Computer System Performance

homes.cs.washington.edu

97 points by sidereal 11 years ago · 13 comments

Reader

ltratt 11 years ago

Benchmarking practises are currently poor, almost without exception. For peak VM performance, we have started to use Kalibera/Jones's method http://kar.kent.ac.uk/33611/7/paper.pdf (we reimplemented the statistical computations at http://soft-dev.org/src/libkalibera/ to make it more accessible). I don't think this method is the end of the story, but it's a definite improvement: we were surprised at some of the odd effects it highlighted (non-determinism was not what I was expecting). It's definitely changed how I think about benchmarking.

oneofthose 11 years ago

Great article the gist I get from it is: running an experiment in computer science is easy (just ./bench), running an experiment in computer science in a correct way is hard. I agree with this assessment.

This plotty tool [0] seems interesting and valuable - but I'm not sure how it relates to the problem the author talks about.

[0] https://github.com/jamesbornholt/plotty

mstromb 11 years ago

Why would linking order affect runtime performance? Something to do with the interaction between offsets and cache, maybe?

Would it be possible to determine ahead of time what order would maximize performance, or would that require profiling?

  • rectang 11 years ago

    I'd speculate that if you're unlucky about link order, two hot cache lines may get mapped to the same slot in an N-way associative cache -- whereas if you're lucky, they end up going to different slots and don't continuously evict each other.

    With regards to alignment... do linkers typically pack objects so tightly that the start of each object isn't aligned on a cache line boundary? AFAIK they're typically 32, 64, or 128 bytes.

    • fleitz 11 years ago

      Cacheline boundary?

      Probably, because caches line sizes are an implementation detail, not part of the architectural specification.

  • lgeek 11 years ago

    Linking order will affect code and static data order and cache alignment in turn. Similarly, environment variables are pushed on the stack by the kernel. Their size will affect alignment of application data on the stack.

    Maybe there are other causes as well.

    > Would it be possible to determine ahead of time what order would maximize performance, or would that require profiling?

    I think at the very least, you'd need profiling to determine the hot code path, and that can change depending on input...

  • halayli 11 years ago

    Guessing, locality of reference might play a role.

    • fleitz 11 years ago

      That, the branch predictors, and caching behaviors, n-way, alignment, etc.

      It probably explains why different runs vary so widely, I always thought it was other things going on in the OS, never really thought about the caches, etc.

      • halayli 11 years ago

        > That, the branch predictors, and caching behaviors, n-way, alignment, etc.

        Those all fall under locality of reference, btw. But yeah cache and branch prediction play a huge role in the list.

        One thing that stung me in the past was OS scheduling.

peterwwillis 11 years ago

There are lies, damned lies, and software benchmarks.

a3089268 11 years ago

I think the author has computer science and computer engineering confused.

  • scott_s 11 years ago

    I don't think he does. What he describes is part of what myself and my colleagues consider "computer science." We typically consider "computer engineering" the design and making of hardware. But to be a systems researcher in computer science, you must know how these things work, and be able to reason about how they affect the software systems you care about.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection