The Art of Computer Programming: Random Numbers
informit.comWhen you step back to think about it, a huge amount of CS theory papers rely on this hand-wavy definition of "randomness" to prove something. Oddly, though, if you actually press a CS theory person for a mathematical definition of this concept, they virtually always draw a blank.
As a student I remember thinking that this was incredibly alarming. Without a good understanding of this concept, how can we be sure any of what we're saying is even remotely correct?
What Knuth has given us here is the ideal treatment of the subject, essentially putting the question of what randomness is, to rest for good. He starts by taking a comprehensive, entertaining history of the landscape (in short: researchers ping-ponged between having definitions of randomness so strict that nothing was random, and so loose that they were all pathologically predictable) before finally proving a theorem that completely solves the problem.
It is a monumental accomplishment in the field, and it is quite shocking to me that it's still so obscure. It's one of my favorite results in all of CS, anywhere.
If you haven't had the chance, I highly recommend the rest of this volume (Seminumerical Algorithms). It is just stuffed with other surprising, amazing results.
Semi-Numerical Algorithms is Knuth's Silmarillion to his jaunty Hobbit of Volume I and the serious Lord of the Rings questing that is Volume III. OK, I know it's a stretch, but Semi-Numerical Algorithms is more of a back story...these days writing a floating point implementation is much less likely than a linked list or a binary search.
But picking up a like new used copy from Amazon for less than $10 with shipping made it to good to pass up, and like the rest of Knuth, every time I read a few pages I learn something new.
John Denker (ex Bell Labs) has a good discussion on devices which rely on random distributions:
http://www.av8n.com/computer/htm/secure-random.htm
---
"My favorite proverb of all is the one that says for every proverb, there is an equal and opposite proverb. In this case, we should note that the proverb about not putting all your eggs in one basket is not necessarily a sound engineering principle. The right answer depends on the margin of error and on the per-basket failure rate. It also greatly depends on the chance of correlated, uncorrelated, and anti-correlated basket failures. Sometimes the best way to get your eggs from point A to point B is to put them all in one basket and take really good care of that basket."
---
What? The modern definition of pseudorandomness (https://en.wikipedia.org/wiki/Pseudorandomness#Pseudorandomn...) was figured out in the 80s through the works of Blum, Goldwasser, Micali, Goldreich, etc. and is not hand-wavy at all. It's pretty rigorous and reliable.
You are 100% mistaken that this implies we had a good definition of random sequences before Knuth. In the article you link, they discuss the uniform distribution, but any distribution (and the modern notion of probability in general) absolutely depends on a mathematically precise notion of random sequences.
If you don't believe me, read the chapter. Early probability theorists (e.g., von Mises, Kolmogorov) literally started thinking about randomness in order to define probability.
EDIT: And, I don't suppose it's worth pointing out that pseudorandomness is not at all the same thing as randomness. The fact that you seem to use them interchangeably is not a good sign IMHO.
EDIT 2: Why the unexplained downvote, HN? :(
I skimmed through the extract presented (didn't have time to go into detail) but I don't see a formal definition of any kind presented in the extract. Could you point me to where it is? And if it's not in the extract, then could you quote it here?
Pseudorandomness is not the same thing as randomness but most algorithms today work on pseudorandom numbers so the concept is important. My impression was that that's what you were referring to.
PS :- FYI I didn't downvote your comment. Actually upvoted as your post made me discover some new math (various notions of randomness by kolmogorov, von mises, martin-lof) :)
So, the excerpt here is chapter 3, section 1. The actual definition happens in chapter 3 section 5 ("What Is a Random Sequence?"). I have the book at home, though, so I can't quote it here. Sorry. But the intuition is, if you have an infinite sequence of random numbers, then the numbers in all infinite subsequences should be equidistributed. So, like, if you a stream of random 0's and 1's, then if you pick only every other number, the 0's and 1's have to be equiprobable, and if you pick every third number, they still have to be equiprobable, etc. This is slightly wrong, but it's on the right track to the actual definition.
re: Pseudorandomness, the point of pseudorandomness is the following.
1. A lot of algorithms use randomness to make pathologically bad cases extremely unlikely. For example, choosing a random pivot in quicksort makes the worst case very unlikely.
2. But in a lot of cases, this leads to huge amounts of space consumption. For example, most frequency moment estimations involve a matrix of random numbers. So if you're getting those numbers from a "truly random" source, then you have to store the entire matrix, which can be huge.
3. So, a better solution is to use a pseudorandom number generator! That way you can store a seed of s bits, and do something clever, like deterministically re-generate the matrix as you need it, rather than storing it outright.
Notice though, that this is not independent of the notion of randomness! In fact they are quite intimately tied together.
Your definition relies on the notion of probability though. So I'm not sure why you seemingly view Knuth's work as more fundamental than Kolmogorov's, etc.
Because the trick Knuth pulls is to express this intuition without appealing to the definition of probability. It's quite clever.
What's the sketch of the trick? I can define randomness by appealing to some of the same basic theory used to develop probability, but it's not really independent despite looking that way from the outside. Does Knuth do this uniquely?
This reminds me of a (somewhat) funny situation that happened to me one time. My parents bought some Lotto tickets and wanted to play, so they filled most of the combinations themselves and left two of them for me. I was annoyed by having to do that, so I take the tickets, and ask my dad, "So, you know that every combination is equally likely to win?", and he casually replies something like, "Yeah, sure, I'm not dumb".
So just to get done with it, I pick 1, 2, 3, 4, 5, 6 and 7 on one combo and 8, 9, 10, 11, 12, 13 and 14 on the other. I give it to him, when he throws his hands in the air and angrily says, "What the hell? You just wasted two combinations, these numbers are never going to get drawn!"
It handily illustrates how hard it is to grasp the concept of true randomness and probability, and even if you do get it, sometimes you'll be caught off guard and your psychological biases will kick in.
I ran through the same script with my dad once, except he answered "Yeah, but you'll have to split the pot with more smartasses like you."
"You know, the most amazing thing happened to me tonight. I was coming here, on the way to the lecture, and I came in through the parking lot. And you won't believe what happened. I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!" -- Feynman
Once you've got pseudorandom bits working, the next step is generating variates from different distributions. Luc Devroye has a nice book on this freely available online, "Non-Uniform Random Variate Generation". http://luc.devroye.org/rnbookindex.html
This book deserves lots of recommendations. Luc Devroye does generally, as well.
Do the Monte Carlo people all know about Barash and Shchur's work? They made an RNGSSELIB and PRAND (using NVIDIA cards), but the main contribution is that they incorporated a new way to generate multiple streams not by re-seeding the generator but by skipping ahead, even with Mersenne twisters. It is the only simple way to get parallel streams, and it didn't exist just a few years ago. You still need to be careful in various ways, but this helps a lot. A lot, a lot.
Skipping ahead is not new; it is a well-known fact that any linearly recurrent generator, such as WELL, Xorshift, or Mersenne Twister can skip ahead via matrix exponentiation.
And it is not the only way to get good independent streams either; hooking a cipher with a counter is arguably a superior way to do it, cf [1].
[1] http://www.thesalmons.org/john/random123/papers/random123sc1...
I like that paper. Thank you. It has references which criticize the skipahead technique. I need to look into whether I believe that paper, though. There are lots of ideas about how to do this, and subtle flaws abound. While many generators are known to skipahead, there is a quite new technique for MT that makes it much faster, not by Barash and Shchur, but they used it. Again, thanks.
A million times thank you for that link! I've been needing something exactly like that for a pet project of mine.
If you're interested in Monte Carlo techniques for statistical physics, I'll recommend Kurt Binder's "Monto Carlo Simulation in Statistical Physics" (http://www.springer.com/physics/theoretical%2C+mathematical+...).
I feel compelled to share this quote from the author of cfengine with whom I have had some emails of late, because it neatly summarizes a more holistic, physics-inspired approach to the computational world's obsession with invariance, which could be seen as present in the quest for 'randomness'.
The simplest idea of stability is constancy, or invariance. A thing that has no possibility to change is, by definition, immune to external pertubations. [...] Invariance is an important concept, but also one that has been shattered by modern ideas of physics. What was once considered invariant, is usually only apparently invariant on a certain scale. When one looks in more detail, we find that we may only have invariance of an average. - Mark Burgess, In Search of Certainty: The Science of Our Information Infrastructure (2013)
This accords well with the opening quotation Lest men suspect your tale untrue, Keep probability in view. - John Gay, English poet and dramatist and member of the Scriblerus Club (1727) https://en.wikipedia.org/wiki/John_Gay
I'm not sure why, but I struggled with the concept of randomness in my algorithms class. On the last day of class, the professor signed his book, "May you one day explain randomness to all of us"
This is old material, right? Or is Knuth publishing a new fascicle?
Volume 2, third edition came out in 1997 http://www-cs-faculty.stanford.edu/~uno/taocp.html
This is old material, from Volume 2. What's new is the recently-published electronic editions, and, I guess, publishing some excerpts in online article form, like this one here.
...and the reference to rap music.