Imagine a large dataset, say something like a list of every country and its population.
| Country | Population |
|---|---|
| Afghanistan | 29,117,000 |
| Albania | 3,195,000 |
| Algeria | 35,423,000 |
| Andorra | 84,082 |
| Angola | 18,993,000 |
| ↑ Leading digit |
Chances are, the leading digit will be a 1 more often than a 2. And 2s would probably occur more often than 3s, and so on.
This odd phenomenon is Benford's Law. If a set of values were truly random, each leading digit would appear about 11% of the time, but Benford's Law predicts a logarithmic distribution. It occurs so regularly that it is even used in fraudulent accounting detection.
See the Wikipedia article for a more thorough discussion.
This is a simple experiment to see how many large, publicly accessible datasets satisfy Benford’s Law.
This site is on GitHub. Please help out by forking the project and adding more datasets.