Purifying spoiled randomness with spoiled randomness

Recently a preprint was posted to ECCC, with the pulse-quickening title “Explicit Two-Source Extractors and Resilient Functions“. If the result is correct, then it really is — shall I say it — a breakthrough in theoretical computer science. One of my favorite things about TCS is how fresh it is. It seems like every other week, an exciting result in all sorts of subareas are announced on arxiv or ECCC. I thought I’d use this as an opportunity to (briefly) explain what this breakthrough is about, for those who aren’t theoretical computer scientists.

Computer programs need sequences of random numbers all the time. Encryption, games, scientific applications need it. But these applications usually assume that they have access to completely unpredictable random number sequences that have no internal patterns or regularities.

But where in the world would we find such perfect, pristine sources of random numbers? The methods that people devise to generate sources of randomness is a fascinating subject in its own right, but long story short, it’s actually really, really hard to get our hands on such good random numbers. Pure randomness, as it turns out, is a valuable resource that is as rare as — perhaps even rarer than — gold or diamond.

On the other hand, it seems much easier to get access to weak randomness, which are number sequences that are pretty unpredictable, but there might be subtle (or not-so-subtle) correlations or relations between the different numbers. For example, one way you might try to get random numbers is to take a digital thermometer that’s accurate to within, say, 2 decimal places — and always take the last digit of the temperature reading whenever you need a random number.
Since the temperature of the air around the thermometer is always fluctuating, this will give some randomness, but it’s probably not perfectly random. If the temperature at this second is 74.32 F, in the next second, it’s probably more likely to be 74.34 F instead of 74.39 F. So there are patterns and correlations within these random numbers.

You might try to devise more elaborate ways to mix up the numbers, but it’s pretty difficult to actually get your scheme to produce perfect unpredictability. Like a pernicious disease, corrupting correlations and predictable patterns can run deep within a sequence of random digits, and removing these unwanted blemishes is a tough task confounded by the fact that you can never be sure if what you’re seeing is a pattern or merely an artifact of the randomness.

But randomness extractors do just do that. They are careful surgeons that expertly cut out the malignant patterns, and leave fresh, pure unpredictability. More technically: randomness extractors are algorithms that are designed to read in weak random number sequences, and output much higher quality randomness — in fact, something that is very, very close to perfectly unpredictable.

Extractors are beautiful algorithms, powered by elegant mathematics and wonderful ideas that have taken decades to understand. And we’re still just beginning to understand them, as this breakthrough result shows.

For a long time now, we’ve known that extractors exist. Via nonconstructive mathematical arguments, we know these algorithms are out there, whose job it is to transform weak randomness into perfect randomness. The only problem was, for quite some time we didn’t know what these algorithms looked like! No one could actually write down a description of the algorithm!

It took a long while, but we finally learned how to construct a certain type of extractor algorithm — called a seeded extractor. Basically, what these algorithms do is to take in, say, 1000 weakly random numbers (for example, temperature readings), and then 10 perfectly random numbers (called the seed), and combine these two sources of numbers to produce 500 perfectly random numbers. So they give you a lot more perfectly unpredictable numbers than you started with.

That’s really nice, but where would you get these 10 perfectly random numbers from? There’s a chicken and the egg problem here (albeit with a smaller chicken and egg each time). Now what theoretical computer scientists have been deep in the hunt for are two-source extractors — the subject of this breakthrough work.

As you might guess, these are extractors that take two unrelated sources of weak randomness, and combine them together to produce ONE perfect random source. So you might imagine taking temperature readings, and then have another apparatus that reads in the direction of the wind, and combine these two weak sources of randomness, to get pristine randomness.

Again, we’ve known that two source extractors exist “in the wild” — but actual sightings of such magical beasts have been rare. Until recently, the best two source extractor we knew how to construct required that the two weak sources weren’t actually that weak — these weak sources needed to be incredibly random, possessing very very few correlations and patterns. Bourgain’s extractor, as it is known, takes these almost-perfect random sources and combines them to be perfect.

It might not sound like much, but Bourgain’s extractor was a tremendous achievement, involving some of the newest ideas from arithmetic combinatorics, a branch of mathematics. And for 10 years, this was the best known.

Until today. Chattopadhyay and Zuckerman have told us how to create a two-source extractor that works with sources that are quite weak. There can be a littany of cross-correlations and patterns and relations between the numbers, but as long as the the random numbers have some unpredictability, their two-source extractor will crunch the numbers, clean up the patterns, clean up the correlations, and produce something that’s pristine, perfect, and unpredictable.

It’s not the end of the story, though. Their algorithm only produces ONE random bit. The next step — and I’m sure people at this very moment are working hard on this — is to improve their algorithm to produce MANY perfectly random bits.

Update (August 15, 2015): It didn’t take long for Xin Li, one of the world’s leading extractor designers, to extend the Chattopadhyay and Zuckerman’s construction to output multiple bits. See his followup paper here. You’re watching science unfold, folks!

–Henry Yuen