What Is Entropy?

jasonfantl.com

288 points by jfantl 12 days ago


TexanFeller - 11 days ago

I don’t see Sean Carroll’s musings mentioned yet, so repeating my previous comment:

Entropy got a lot more exciting to me after hearing Sean Carroll talk about it. He has a foundational/philosophical bent and likes to point out that there are competing definitions of entropy set on different philosophical foundations, one of them seemingly observer dependent: - https://youtu.be/x9COqqqsFtc?si=cQkfV5IpLC039Cl5 - https://youtu.be/XJ14ZO-e9NY?si=xi8idD5JmQbT5zxN

Leonard Susskind has lots of great talks and books about quantum information and calculating the entropy of black holes which led to a lot of wild new hypotheses.

Stephen Wolfram gave a long talk about the history of the concept of entropy which was pretty good: https://www.youtube.com/live/ocOHxPs1LQ0?si=zvQNsj_FEGbTX2R3

quietbritishjim - 11 days ago

I like the axiomatic definition of entropy. Here's the introduction from Pattern Recognition and Machine Learning by C. Bishop (2006):

> The amount of information can be viewed as the ‘degree of surprise’ on learning the value of x. If we are told that a highly improbable event has just occurred, we will have received more information than if we were told that some very likely event has just occurred, and if we knew that the event was certain to happen we would receive no information. Our measure of information content will therefore depend on the probability distribution p(x), and we therefore look for a quantity h(x) that is a monotonic function of the probability p(x) and that expresses the information content. The form of h(·) can be found by noting that if we have two events x and y that are unrelated, then the information gain from observing both of them should be the sum of the information gained from each of them separately, so that h(x, y) = h(x) + h(y). Two unrelated events will be statistically independent and so p(x, y) = p(x)p(y). From these two relationships, it is easily shown that h(x) must be given by the logarithm of p(x) and so we have h(x) = − log2 p(x).

This is the definition of information for a single probabilistic event. The definition of entropy of a random variable follows from this by just taking the expectation.

nihakue - 11 days ago

I'm not in any way qualified to have a take here, but I have one anyway:

My understanding is that entropy is a way of quantifying how many different ways a thing could 'actually be' and yet still 'appear to be' how it is. So it is largely a result of an observer's limited ability to perceive / interrogate the 'true' nature of the system in question.

So for example you could observe that a single coin flip is heads, and entropy will help you quantify how many different ways that could have come to pass. e.g. is it a fair coin, a weighted coin, a coin with two head faces, etc. All these possibilities increase the entropy of the system. An arrangement _not_ counted towards the system's entropy is the arrangement where the coin has no heads face, only ever comes up tails, etc.

Related, my intuition about the observation that entropy tends to increase is that it's purely a result of more likely things happening more often on average.

Would be delighted if anyone wanted to correct either of these intuitions.

asdf_snar - 11 days ago

I throw these quotes by Y. Oono into the mix because they provide viewpoints which are in some tension with those who take -\sum_x p(x) log p(x) definition of entropy as fundamental.

> Boltzmann’s argument summarized in Exercise of 2.4.11 just derives Shannon’s formula and uses it. A major lesson is that before we use the Shannon formula important physics is over.

> There are folklores in statistical mechanics. For example, in many textbooks ergodic theory and the mechanical foundation of statistical mechanics are discussed even though detailed mathematical explanations may be missing. We must clearly recognize such topics are almost irrelevant to statistical mechanics. We are also brainwashed that statistical mechanics furnishes the foundation of thermodynamics, but we must clearly recognize that without thermodynamics statistical mechanics cannot be formulated. It is a naive idea that microscopic theories are always more fundamental than macroscopic phenomenology.

sources: http://www.yoono.org/download/inst.pdf http://www.yoono.org/download/smhypers12.pdf

xavivives - 11 days ago

Over the last few months, I've been developing an unorthodox perspective on entropy [1] . It defines the phenomenon in much more detail, allowing for a unification of all forms of entropy. It also defines probability through the same lens.

I define both concepts fundamentally in relation to priors and possibilities:

- Entropy is the relationship between priors and ANY possibility, relative to the entire space of possibilities.

- Probability is the relationship between priors and a SPECIFIC possibility, relative to the entire space of possibilities.

The framing of priors and possibilities shows why entropy appears differently across disciplines like statistical mechanics and information theory. Entropy is not merely observer-dependent, but prior-dependent. Including priors not held by any specific observer but embedded in the framework itself. This helps resolve the apparent contradiction between objective and subjective interpretations of entropy.

It also defines possibilities as constraints imposed on an otherwise unrestricted reality. This framing unifies how possibility spaces are defined across frameworks.

[1]: https://buttondown.com/themeaninggap/archive/a-unified-persp...

glial - 11 days ago

One thing that helped me was the realization that, at least as used in the context of information theory, entropy is a property of an individual (typically the person receiving a message) and NOT purely of the system or message itself.

> entropy quantifies uncertainty

This sums it up. Uncertainty is the property of a person and not a system/message. That uncertainty is a function of both a person's model of a system/message and their prior observations.

You and I may have different entropies about the content of the same message. If we're calculating the entropy of dice rolls (where the outcome is the 'message'), and I know the dice are loaded but you don't, my entropy will be lower than yours.

hatthew - 11 days ago

I'm not sure I understand the distinction between "high-entropy macrostate" and "order". Aren't macrostates just as subjective as order? Let's say my friend's password is 6dVcOgm8. If we have a system whose microstate consists of an arbitrary string of alphanumeric characters, and the system arranges itself in the configuration 6dVcOgm8, then I would describe the macrostate as "random" and "disordered". However, if my friend sees that configuration, they would describe the macrostate as "my password" and "ordered".

If we see another configuration M2JlH8qc, I would say that the macrostate is the same, it's still "random" and "unordered", and my friend would agree. I say that both macrostates are the same: "random and unordered", and there are many microstates that could be called that, so therefore both are microstates representing the same high-entropy macrostate. However, my friend sees the macrostates as different: one is "my password and ordered", and the other is "random and unordered". There is only one microstate that she would describe as "my password", so from her perspective that's a low-entropy macrostate, while they would agree with me that M2JlH8qc represents a high-entropy macrostate.

So while I agree that "order" is subjective, isn't "how many microstates could result in this macrostate" equally subjective? And then wouldn't it be reasonable to use the words "order" and "disorder" to count (in relative terms) how many microstates could result in the macrostate we subjectively observe?

IIAOPSW - 12 days ago

Its the name for the information bits you don't have.

More elaborately, its the number bits needed to fully specify something which is known to be in some broad category of state but the exact details to calculate it are unknown.

voidhorse - 11 days ago

I think this is a pretty good introduction but it gets a little bogged down in the binary encoding assumption, which is an extraneous detail. It does help to know why the logarithm is chosen as a measure of information though regardless of base, once you know that "entropy" is straightforward. I'd agree that much of the difficulty arises from the uninformative name and the various mystique it carries.

To try to expand on the information measure part from a more abstract starting point: Consider a probability distribution, some set of probabilities p. We can consider it as indicating our degree of certainty about what will happen. In an equiprobable distribution, e.g. a fair coin flip (1/2, 1/2) there is no skew either which way, we are admitting that we basically have no reason to suspect any particular outcome. Contrarily, in a split like (1/4, 3/4) we are stating that we are more certain that one particular outcome will happen.

If you wanted to come up with a number to represent the amount of uncertainty, it's clear that the number should be higher the closer the distribution is to being completely equiprobable (1/2, 1/2)—complete lack of certainty about the result, and the number should be smallest when we are 100% certain (0, 1).

This means that the function has to be an order inversion on the probability values—that is I(1) = 0 (no uncertainty). The logarithm, to arbitrary base (selecting a base is just a change of units) has this property under the convention that I(0) = inf (that is, a totally improbable event carries infinite information—after all, an impossibility occurring would in fact be the ultimate surprise).

Entropy is just the average of this function taken over the probability values (multiply each probability in the distribution by the log of the inverse of the probabilities and sum them). In info theory you also usually assume the probabilities are independent, and so the further condition that I(pq) = I(p) + I(q) is also stipulated.

karpathy - 11 days ago

What I never fully understood is that there is some implicit assumption about the dynamics of the system. So what that there are more microstates of some macrostate as far as counting is concerned? We also have to make assumptions about the dynamics, and in particular about some property that encourages mixing.

tsimionescu - 11 days ago

This goes through all definitions of entropy, except the very first one, which is also the one that is in fact measurable and objective: the variation in entropy is the amount of heat energy that the system exchanges with the environment at a given temperature during a reversible process. While tedious, this can be measured, and it doesn't depend on any subjective knowledge about the system. Any two observers will agree on this value, even if one knows all of the details of every single microstate.

anon84873628 - 11 days ago

Nitpick in the article conclusion:

>Heat flows from hot to cold because the number of ways in which the system can be non-uniform in temperature is much lower than the number of ways it can be uniform in temperature ...

Should probably say "thermal energy" instead of "temperature" if we want to be really precise with our thermodynamics terms. Temperature is not a direct measure of energy, rather it is an extensive property describing the relationship between change in energy to change in entropy.

brummm - 11 days ago

I love that the author clearly describes why saying entropy measures disorder is misleading.

bargava - 12 days ago

Here is a good overview on Entropy [1]

[1] https://arxiv.org/abs/2409.09232

marojejian - 11 days ago

This is the best description of entropy and information I've read: https://arxiv.org/abs/1601.06176

Most of all, it highlights the subjective / relative foundations of these concepts.

Entropy and Information only exist relative to a decision about the set of state an observer cares to distinguish.

It also caused me to change my informal definition of entropy from a negative ("disorder)" to a more positive one ("the number of things I might care to know")

The Second Law now tells me that the number of interesting things I don't know about is always increasing!

This thread inspired me to post it here: https://news.ycombinator.com/item?id=43695358

dswilkerson - 11 days ago

Entropy is expected information. That is, given a random variable, if you compute the expected value (the sum of the values weighted by their probability) of the information of an event (the log base 2 of the multiplicative inverse of the probability of the event), you get the formula for entropy.

Here it is explained at length: "An Intuitive Explanation of the Information Entropy of a Random Variable, Or: How to Play Twenty Questions": http://danielwilkerson.com/entropy.html

bowsamic - 11 days ago

I didn’t read in depth but it seems to me on first glance (please correct me if I’m wrong) but as with all articles on entropy this seems to explain everything but the classical thermodynamic quantity called entropy which is 1. the quantity to which all these others are chosen to be related to and 2. the one that is by far the most difficult to explain intuitively

Information and statistical explanations of entropy are very easy. The real question is, what does entropy mean in the original context that it was introduced in, before those later explanations?

im3w1l - 11 days ago

So here is an amusing thought experiment I thought of at one point.

Imagine a very high resolution screen. Say a billion by a billion pixels. Each of them can be white, gray or black. What is the lowest entropy possible? Each of the pixels has the same color. How does the screen look? Gray. What is the highest entropy possible? Each pixel has a random color. How does it look from a distance? Gray again.

What does this mean? I have no idea. Maybe nothing.

Also sorry for writing two top level comments, but I just really care about this topic

flanked-evergl - 11 days ago

Not sure what the point of this article, it seems to focus on confusion which could be cleared up with a simple visit to wikipedia.

> But I have no idea what entropy is, and from what I find, neither do most other people.

The article does not go on to explain what entropy is, it just tries to explain away some hypothetical claims about entropy which as far as we can tell do hold, and does not explain why, if they were wrong, they do in fact hold.

im3w1l - 11 days ago

As a kid I wanted to invent a perpetuum mobile. From that perspective, entropy is that troublesome property that prevents a perpetuum mobile of the second kind. And any fuzziness or ambiguity in its definition is a glimmer of hope that we may yet find a loop hole.

jwilber - 11 days ago

There’s an interactive visual of Entropy here in the Where To Partition section (midway thru the article): https://mlu-explain.github.io/decision-tree/

jwarden - 11 days ago

Here's my own approach to explaining entropy as a measure of uncertainty: https://jonathanwarden.com/entropy-as-uncertainty

FilosofumRex - 11 days ago

Boltzmann and Gibbs turn in their graves, every time some information theorist mutilates their beloved entropy. Shanon & Von Neumann were hacking a new theory of communication, not doing real physics and never meant to equate thermodynamic concepts to encoding techniques - but alas now dissertations are written on it.

Entropy can't be a measure of uncertainty, because all the uncertainty is in the probability distribution p(x) - multiplying it with its own logarithm and summing doesn't tell us anything new. If it did, it'd violate quantum physics principles including the Bell inequality and Heisenberg uncertainty.

The article never mentions the simplest and most basic definition of entropy, ie its units (KJ/Kelvin), nor the 3rd law of thermodynamics which is the basis for its measurement.

“Every physicist knows what entropy is. Not one can write it down in words.” Clifford Truesdell

Ono-Sendai - 11 days ago

Anyone else notice how the entropy in the 1000 bouncing balls simulation goes down at some point, thereby violating the second law of thermodynamics? :)

gozzoo - 11 days ago

The visualisation is great, the topic is interesting and very well explained. Can sombody recomend some other blogs with similar type of presentation?

fedeb95 - 11 days ago

given all the comments, it turns out that a post on entropy has high entropy.

vitus - 11 days ago

The problem with this explanation (and with many others) is that it misses why we should care about "disorder" or "uncertainty", whether in information theory or statistical mechanics. Yes, we have the arrow of time argument (second law of thermodynamics, etc), and entropy breaks time-symmetry. So what?

The article hints very briefly at this with the discussion of an unequally-weighted die, and how by encoding the most common outcome with a single bit, you can achieve some amount of compression. That's a start, and we've now rediscovered the idea behind Huffman coding. What information theory tells us is that if you consider a sequence of two dice rolls, you can then use even fewer bits on average to describe that outcome, and so on; as you take your block length to infinity, your average number of bits for each roll in the sequence approaches the entropy of the source. (This is Shannon's source coding theorem, and while entropy plays a far greater role in information theory, this is at least a starting point.)

There's something magical about statistical mechanics where various quantities (e.g. energy, temperature, pressure) emerge as a result of taking partial derivatives of this "partition function", and that they turn out to be the same quantities that we've known all along (up to a scaling factor -- in my stat mech class, I recall using k_B * T for temperature, such that we brought everything back to units of energy).

https://en.wikipedia.org/wiki/Partition_function_(statistica...

https://en.wikipedia.org/wiki/Fundamental_thermodynamic_rela...

If you're dealing with a sea of electrons, you might apply the Pauli exclusion principle to derive Fermi-Dirac statistics that underpins all of semiconductor physics; if instead you're dealing with photons which can occupy the same energy state, the same statistical principles lead to Bose-Einstein statistics.

Statistical mechanics is ultimately about taking certain assumptions about how particles interact with each other, scaling up the quantities beyond our ability to model all of the individual particles, and applying statistical approximations to consider the average behavior of the ensemble. The various forms of entropy are building blocks to that end.

alex5207 - 11 days ago

Super read! Thanks for sharing

sysrestartusr - 11 days ago

at some point my take became: if nothing orders the stuff that lies and flies around, any emergent structures that follow the laws of nature eventually break down.

organisms started putting things in places to increase "survivability" and thriving of themselves until the offspring was ready for the job at which point the offspring started to additionaly put things in place for the sake of the "survivability" and thriving of their ancestors ( mostly overlooking their nagging and shortcomings because "love" and because over time, the lessons learned made everything better for all generations ) ...

so entropy is only relevant if all the organisms that can put some things in some place for some reason disappear and the laws of nature run until new organisms emerge. ( which is why I'm always disappointed at leadership and all the fraudulent shit going on ... more pointlessly dead organisms means less heads that can come up with ways to put things together in fun and useful ways ... it's 2025, to whomever it applies: stop clinging to your sabotage-based wannabe supremacy, please, stop corrupting the law, for fucks sake, you rich fucking losers )

nanna - 11 days ago

Yet another take on entropy and information focused on Claude Shannon and lacking even a single mention of Norbert Wiener, even though they invented it simultaneously and evidence suggests Shannon learned the idea from Wiener.

NitroPython - 11 days ago

Love the article, my mind is bending but in a good way lol

timonoko - 11 days ago

[flagged]

DadBase - 11 days ago

My old prof taught entropy with marbles in a jar and cream in coffee. “Entropy,” he said, “is surprise.” Then he microwaved the coffee until it burst. We understood: the universe favors forgetfulness.

ponty_rick - 11 days ago

As a software engineer, I learned what entropy was in computer science when I changed the way that a function was called which caused the system to run out of entropy in production and caused an outage. Heh.

alganet - 12 days ago

Nowadays, it seems to be a buzzword to confuse people.

We IT folk should find another word for disorder that increases over time, specially when that disorder has human factors (number of contributors, number of users, etc). It clearly cannot be treated in the same way as in chemistry.