Settings

Theme

Measuring Information in Millibytes

engineering.vena.io

47 points by Ethcad 7 years ago · 18 comments

Reader

eridius 7 years ago

> This is a little more than eight thousandths of one bit. Therefore, the information given by one passing test run is just a little over one millibyte. To debug this compiler, we're going to be running hundreds and hundreds of experiments, each of which gives us a tiny fraction of one bit, until we can finally put together the single byte we need to isolate the faulty optimization.

It doesn't work that way. If you're dead set on measuring a single passing test run as a milllibyte's worth of information, you have to realize that every subsequent passing test run is worth less and less. Otherwise running a thousand passing tests would give you a full byte of information, and yet a thousand passing tests does not give you 100% confidence that the enabled optimizations are not buggy. The only way to get a full byte's worth of information is to actually hit upon a failing test run.

And in fact even this is mistaken. You don't have a millibyte's worth of information. You have a lot more. The number of passing test runs at a given optimization gives you a probability that the optimization is buggy. Claiming that all this information you have about probabilities of the bugginess of various optimization passes is really just a millibyte is misleading; you have a lot of information, you're just ignoring it and saying you only care about a millibyte's worth of it.

  • hinkley 7 years ago

    I run into this with much larger p and have to explain it to coworkers.

    Flip the scenario around: the code is fine but the test has a concurrency bug in it. It fails about 20% of the time. You ask someone to fix it. After six green test runs they declare the bug fixed. But then build 8 fails and so does build 10.

    Why? Because probabilities aren’t additive, they’re multiplicative. You didn’t test 6p > 1, you tested (1 - o)^6 < 0, which will never be true.

    In this case there was a 26% chance the test was still broken after 6 green tests. You need 11 runs just to get the probability into single digits.

    (Another consequence of this is that if you have 5 flaky tests the probability of a green build is one in four, and the likelihood of getting multiple consecutive red builds is high enough that it happens every week or two).

  • Silixon 7 years ago

    In the limit of infinite passing tests, do we recover the byte we sought? Or does it asymptote to some other quantity? I'm asking relative to a maximum entropy prior.

cperciva 7 years ago

Tarsnap storage costs 0.25 picodollars per millibyte-month of storage.

  • WJW 7 years ago

    And it works great! Well worth my picodollars.

    • hinkley 7 years ago

      If I write a check with the dollar amount in scientific notation, will my bank honor it? How about a credit union?

    • mercer 7 years ago

      I'm a fan as well, but I'm a bit worried about how slow retrieving an archive is. any suggestions for increasing the speed of that? the only solution I've thought of is to split up my archives into smaller pieces.

infinity0 7 years ago

> You decide to try running with optimizations 1-128 enabled, and 129-256 disabled. That run fails. Now you know that one of the first 128 optimizations is to blame. You try again with 1-64 enabled, and 65-128 disabled, and that run passes. The bad optimization must be between 65 and 128. Continuing in this fashion—a binary search—you're able to locate the faulty optimization in just eight steps.

This doesn't work for the case where the failure(s) might be caused by a combination of a subset of the optimisations. That is,

1000 0100 0010 0001

all succeed, but

0101

fails.

A binary search is not possible for these more complex bugs. We had such cases in the reproducible builds project.

For the case where you can assume (subset A exhibits a failure => any superset of A exhibits a failure), then you can just go through all the options (in the OP's case 256) and switch them on, one at a time. Then the running time is 256, not 8.

For more complex cases where you can't assume even that, then you're stuck indeed with 2^256 test cases.

  • hinkley 7 years ago

    I think for pairs you can still do a logarithmic search, it’s just not base 2.

    A trinary or quaternary search should be able to find two bugs at once (and intuition tells me a quad search should be able to handle three).

    Say you have 16 optimizations and 7 and 12 don’t work together. Eliminate the first half, the second half, the odd or the even opts and the build is still green. But if you remove the first third or quarter the tests still fail.

    So in a trinary search you might see 1-16 -> 6-16 -> 6-13 -> 6-8,12-13 -> 6,7,8,12 -> 6,7,12 -> 7,12. Including the first and last sanity checks you’re in for about 8-14 test passes for 16 tests, which is a lot better than 2^16.

    But there’s still a binary search you can do, and that’s a bisect of the commit history. Before some commit this bug didn’t happen. And that’s going to contain one half of the bug. Now you run a second binary search to find its complement.

  • carry_bit 7 years ago

    Optimization fuel[1] is a neat solution: have the compiler perform only the first n optimization operations, then use binary search to find which operation was bad. You can then break execution at that point to see what is going wrong.

    [1] http://blog.ezyang.com/2011/06/debugging-compilers-with-opti...

tingletech 7 years ago

if you use base e logarithms, you can measure information in natural units aka nats. https://en.wikipedia.org/wiki/Nat_(unit)

_nalply 7 years ago

Another example of sub-bit information:

Base85! 85 different ASCII characters have lb(85) ≈ 6.4094 bit. With 5 of then you get slightly more than 32 bits. Each of the 5 characters have a surplus 0.4 bit of information to add up to 2 bits.

To encode to Base85 use a 32 bit entity as an 32 bit unsigned integer then divide by 85 five times and take the remainders (from 0 to 84) as indices to the string of all Base85 characters of length 85.

To decode view the characters as a number written in base 85. This means that there are a few Base85 representations which are more than 2^31 - 1, this would mean an decoding error.

codethief 7 years ago

To anyone interested in learning more about Shannon entropy, I recommend reading Shannon's original article. It's highly readable!

There's a link to the PDF on https://en.m.wikipedia.org/wiki/A_Mathematical_Theory_of_Com... .

nayuki 7 years ago

Data compression is another place where measuring in millibytes is feasible. For example, if a video codec took in pixels in RGB8.8.8 format (3 bytes per pixel) and has a 100:1 compression ratio, then you could say that the compressed video uses an average of 30 millibytes per pixel.

lucio 7 years ago

>Why eight steps? Well, to differentiate between 256 equally likely possibilities requires eight bits. Each experiment, since it differentiates between two equally likely outcomes (pass and fail), provides one bit of information. Therefore, we need eight such experiments to give us the eight bits we need to determine the faulty optimization.

Is this explanation a little odd?

Each step in a binary search halves the amount of suspects, you need 8 steps because with 8 steps you go from 256 suspects to one.

I can't see it the way the article describes it.

It's just me? Anyone else found the article reasoning for 8 steps a little odd?

  • winceschwab 7 years ago

    Yeah, I noticed that too. Treating a byte as eight tally marks misses the point about the combinatorialnature of place settings.

    It's not 100% wrong, because bit shifting and bit packing are real-world applications used in memory all the time.

    But these are not the only possible values:

      00000001
      00000010
      00000100
      00001000
      00010000
      00100000
      01000000
      10000000
    
    It's not incorrect to make a decision to limit one's input to those values, but it is incorrect to presume that binary data can only be recorded that way.

    Treating those 8 place settings as eight lightbulb circuits that each need an individual test isn't the right mindset for someone performing compiler optimization.

bitwize 7 years ago

I got a paragraph in and was like -- he had to run eight tests a thousand times to get one yes/no answer each, didn't he.

RobIII 7 years ago

Password entropy is another place where fractional bits can be encountered.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection