Purifying spoiled randomness with spoiled randomness

mittheory.wordpress.com

35 points by schrototo 11 years ago · 15 comments

Reader

I haven't read either of the papers yet, but can anyone comment on what makes it difficult to output multiple bits. Naively, couldn't you gather weak random data from the two sources, get a single bit, then gather more weak data, and run the extractor again to get a second bit?

henrytcs 11 years ago

Hi gizmo686. Yes, what you propose is correct. But this requires that every time you iterate this process, you're required to find additional sources of weak data that is completely independent of the sources of weak data you used in the previous iterations. In that case, you're describing an extractor that requires multiple independent sources.
What the challenge was (and it looks like there was some progress a couple weeks ago), was that you'd like to produce multiple bits using only two independent sources, and no more than that!

fenomas 11 years ago

Can someone in the know post a conceptual description of roughly what's going on here?

The last time I read up on randomness, I was given to believe that it's not really an observable quantity - that is, a sequence of numbers is random only to the extent that nobody's found a pattern in them yet, and as such, the most rigorous way we have of testing strong RNGs is to run them through a battery of test for the sorts of patterns that are known to show up in weak RNGs. But that sounds far-removed from the situation the article describes, where this or that generator can be proven to be perfect or imperfect.

Is this the gap between theoretical analysis and real-world implementations, or am I misunderstanding something more fundamental?

ThrustVectoring 11 years ago

You're on the right track with randomness. "Random number" is a bad word - it implies that randomness is a property of the number, rather than the more complete picture of "you can expect that people will be unable to predict the number, given a certain set of background information."
- bbrazil 11 years ago
  
  To expand on that a little, the main property you're looking for is that if I give you the N random numbers that have already been generated you can't predict what the next one will be with certainty beyond random chance.
  https://en.wikipedia.org/wiki/Cryptographically_secure_pseud... has more detail.
- fenomas 11 years ago
  
  Naturally, but what does that have to do with what I asked?
  - henrytcs 11 years ago
    
    Hi, this is the author of the blog post here. Fenomas raises a good point that I completely glossed over in my post, which is what it means for something to be random or unpredictable.
    Indeed, one can't ever know for certain whether a sequence of numbers is random (after all, the universe we live in could be a deterministic finite state automaton). Instead, in this context of randomness extractors, we simply assume there is such thing as random numbers, and that there are sources of randomness that are more random than others. For example, a uniformly random sequence of 100 bits is more random than a random sequence of 100 bits where the last bit is always the XOR of the previous 99 bits.
    Like someone else mentioned, we can quantify the quality of a random source by its entropy (more specifically, min-entropy, but ignore that for now). A randomness extractor operates under the premise that its inputs are sources with some amount of entropy (but maybe not full entropy). Its output is supposed to be a random source with full entropy (it distills "all the good stuff" from the sources).
    One might find some philosophical difficulty with the idea that we can ever obtain sources with perfect randomness -- or know that we have. But the point of a randomness extractor is to say, if you believe that you have access to sources with some amount of entropy, then you can use an extractor to obtain sources with full entropy.
    I hope this makes sense.
    
    fenomas 11 years ago
    
    Ahh, so it's basically the difference between theory and practice. Thanks for explaining!

PaulAJ 11 years ago

I thought that entropy solved this problem: your "weakly" random numbers from the thermometer might have, say, 1.3 bits of entropy for every reading. So you assemble 100 readings and that gives you 130 bits of randomness, which you extract by putting your 100 readings through a cryptographic hash algorithm that outputs 130 bits.

Presumably I'm missing something. Can someone tell me what it is?

versteegen 11 years ago

Whether a hash function is cryptographic or not is irrelevant: either way, it's computable, therefore you can reverse it and get information about the original data you hashed. Because that data was only weakly random, it's not independent of future data from the same sources, so you can compute what future results of the hashing algorithm are likely to be.
Obviously this is impractical in the real world, but in TCS randomness is defined as algorithmic randomness, which completely ignores the question of running time. Randomness is a mathematical concept, something can't suddenly become non-random because you gained a faster computer. For more information see: https://en.wikipedia.org/wiki/Kolmogorov_complexity#Kolmogor...
- versteegen 11 years ago
  
  To correct that: there are many different definitions of randomness, and they're not actually using a form of algorithmic randomness in this paper (there are many definitions of that too), which I guess isn't so relevant for randomness extraction.
henrytcs 11 years ago

That's the right intuition. But here we have to distinguish between information theoretic randomness and computational randomness (also known as algorithmic randomness). In the former, we're guaranteed that the output is actually statistically close to uniform randomness, and no statistical test, no matter how much computing time it uses, can predict any bit of the output. In the latter, the output can actually be far from uniformly random, but it has the guarantee that no efficient program (i.e. polynomial time algorithm) can distinguish between the output, and true uniform randomness.
A randomness extractor guarantees the former (i.e. information theoretic randomness), whereas a cryptographic hash function can only, at best, guarantee the latter (i.e. computational randomness). One might say, for all intents and purposes, we only care about computational randomness anyways. The problem is, cryptographic hash functions only work under complexity assumptions that we don't know to be true. These complexity assumptions have a similar flavor to the P vs NP question, which you might've heard of.
On the other hand, we can actually outright prove that randomness extractors work, and they output information theoretic randomness -- no complexity assumptions required!
So, long story short, we'd like to theoretically ensure, and prove outright, that we can take weak sources of randomness, and amplify them to strong sources of randomness. Randomness extractors do just that.

ademarre 11 years ago

How far away are we from using this practically? e.g. /dev/random

Settings

Purifying spoiled randomness with spoiled randomness

Keyboard Shortcuts