Understand Entropy in One Chart
observablehq.comIn general, we consider data with high entropy to be less informative, and data with less entropy to be more informative.
This is exactly backwards, large entropy means large information content. If the circles are all red or all blue, then you need only one bit to distinguish between the two possibilities. If half of the circles are red and the other half blue, then you need one bit per circle to describe the circles.
This is just the classic problem of mixing technical terms (like information) and using dictionary definitions. Personally, I believe they intend to say the same thing as you. What they are trying to say is that the lower the entropy, the more similar to a Dirac function. In their mind, this means you know exactly what the distribution is and hence "informative". But, as you point out, that just means you already know everything which is the exact opposite of information. In the context of Wordle, guessing a word with 0 entropy would be a wasted guess as you would have all the previous words remaining. That is, guessing a word that has already been guessed. How informative!
Thanks for this. I was following along just fine until that last sentence, which caused me to think "nope, I guess I don't follow at all".
I struggle to understand entropy because it’s meaning seems to shift in subtle ways across disciplines (cryptography, thermodynamics, cs, etc) and I’m never quite sure if the which version I’m getting. Is there a basic definition of entropy that’s extended in different contexts?
> My greatest concern was what to call it. I thought of calling it 'information,' but the word was overly used, so I decided to call it 'uncertainty.' When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, 'You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.' [1]
[1] https://en.wikiquote.org/wiki/Claude_Elwood_Shannon#:~:text=...,'
There are so many types of entropy. Why not understand it in your domain and then backtrack the more general definition?
So I studied physics and know a fair bit of stat mech. I, like you, know that the entropy is the weighted log of the number of microstates of a system. Does this help me understand that? What are the different states/outcomes? What are the probabilities of being in a specific state? Also what equation are they using? I'm not sure what this is supposed to help me understand.
Also the entropy you calculate depends on what you know about the system in question...
That function is simply -xlog(x)-(1-x)log(1-x) where the logs are in base 2. In general, entropy is E[-log(P(X))] for any random variable. Microstates and such are a physics/chemistry centric application and the namesake but mentioning states is never done from an information theory perspective.
Wonderful. Cool to see it on observable as well.