The history of light

Last time, I went through the fairly obscure story of scurvy in order to rush to one of the main ideas behind this entire blog — that scientific breakthroughs come from a process of discovery, from experimentally looking around in unexplored places and finding something enlightening.

Now we’ll back up and cover some of the ideas already out there about how science works. In this post, we start slowly, mostly telling another big story, this time about the extended history of special relativity. This turns out to be a nice narrative companion for discussing (among other things) one of the earliest models of science: a priori knowledge.

As it happens, this is also the one great historical counterexample to the very thesis of the last post — but that’s why we are only at the beginning now.

The fundamental task of science is predicting the future based on experiences from the past. If you can do that, you can know how things will or would go without having to wait for them to actually happen. From there, you can prepare for what you cannot change and set up favourably what you can. Starting with no expectations, it’s not clear why this should work at all, or to what extent — in daily life, some things are predictable and routine and some are surprising and one-off. Yet historical experience tells us that some of the complexity can be tamed, that there exist hidden patterns in the world, and that today’s civilization depends on them.

Induction is a process where one takes individual facts about the world (say, the sun rose today, and yesterday, and on millions of days before in recorded history) and turns them into universal statements (the sun rises every day). The problem of induction is that this procedure is unjustified and the world often comes up with surprises.

No amount of confirmed examples of a universal statement can prove it completely. A famous counterexample (due to Bertrand Russell) is a turkey grown by a farmer for Thanksgiving: every day it’s cared for and fed, so it might expect that to keep happening, but then Thanksgiving comes and the turkey is killed. For a real history-of-science example, every attempt at measuring the speed of light known to Descartes had found it too quick to measure, and yet less than thirty years after Descartes died, astronomical measurements by Ole Rømer revealed it to be finite.

The early discussion around the problem of induction is a similar mess to what we explored at the beginning with Descartes and Kant. The biggest name associated with formulating the problem is David Hume, who in his discussion uses the same language as Descartes:

It implies no contradiction that the course of nature may change, and that an object seemingly like those which we have experienced, may be attended with different or contrary effects.
— David Hume, An Enquiry Concerning Human Understanding (1748)

It is possible, [Hume] says, to clearly and distinctly conceive of a situation where the unobserved case does not follow the regularity so far observed.
— Standford Encyclopedia of Philosophy, induction problem1

Hume says that induction is thus not a process of reason but of instinct, and seems to be satisfied with that. But he wrote this in 1748, with plenty of surprising scientific discoveries already behind him: microscopes had shown that apparently uniform plant matter was made of cells, telescopes had shown that not everything orbited the Earth or even the Sun, Rømer measured the finite speed of light in 1676. One would like to know when induction can be used, since it (by experience) works some but not all times, but Hume ignored that question entirely, instead explaining his view using examples like “snow is associated with cold“ that are so unproblematic they aren’t particularly in need of explanation.

Another attempt was by Kant in his Critique of Pure Reason (1781). This text is intimidating, so let’s again cut with a hatchet, starting with this paragraph.

It is evident that space and time both present us with many incontrovertible and synthetic propositions a priori.
Geometric proof is certain — where does this certainty come from?
Either by conceptions (?) or intuitions (?). Empirical conceptions won’t help, because geometric proof is absolute, necessary, and universal, which empirical propositions aren’t.
We need a priori conceptions or intuitions. But you can thus only obtain analytic cognitions (ones true by definition, e.g. “a triangle has three sides”), not synthetic ones (anything else).

Take, for example, the [synthetic] proposition, “Two straight lines cannot enclose a space, and with these alone no figure is possible,“ and try to deduce it from the conception of a straight line, and the number two; or take the proposition, “It is possible to construct a figure with three straight lines,” and endeavour, in like manner, to deduce it from the mere conception of a straight line and the number three. All your endeavours are in vain, and you find yourself forced to have recourse to intuition, as, in fact, geometry always does.

But this intuition must be a priori as well, because it can’t be empirical (see above).
Thus: if you didn’t have access to a priori intuitions about triangles, to something in your mind that is the same as that which rules triangles, how could you say anything about triangles with certainty?2
Without a priori concepts of space and time you could not say anything (synthetic) about external objects.

It is therefore not merely possible or probable, but indubitably certain, that Space and Time, as the necessary conditions of all our external and internal experience, are merely subjective conditions of all our intuitions, in relation to which all objects are therefore mere phænomena, and not things in themselves, presented to us in this particular manner. And for this reason, in respect to the form of phænomena, much may be said a priori, whilst of the thing in itself, which may lie at the foundation of these phænomena, it is impossible to say any thing.

The punchline, and the only reason I’m including all this rambling at all, is that today we know that it is possible for two straight lines (at least, something like straight lines) to enclose a space. On a curved surface, like that of the Earth, this is done by two meridians connecting the North and South poles by different routes. In the space we physically inhabit, this is also possible because of general relativity and the curvature of spacetime, and in fact it’s exactly the phenomenon behind gravitational lensing:

Light can travel along two straight lines that enclose a space and form a figure. (source)

So let’s leave philosophy behind and get into the more understandable area of relativity. Because if Kant thinks we couldn’t do anything without an a priori fixed understanding of space and time, then how could the notions of space and time have ever been changed?

With this stage set, let’s move on to some concrete history. We will start with special relativity.

In popular science (e.g. here — the second Google result for “special relativity history“ after Wikipedia), but also in many textbooks (e.g. Feynman’s Lectures), there is a short historical account of how special relativity came about that goes roughly like this:

The theories of Newtonian mechanics and classical electrodynamics were quickly (after the latter’s completion by Maxwell in 1864) discovered to be inconsistent: in Newtonian mechanics, observers moving with respect to each other measure the speed of anything (including light) to be different, while classical electrodynamics assumes that the speed of light is a constant common to all observers.

This contradiction was provisionally resolved by saying that Newtonian mechanics was right, that light spread through the luminiferous aether (some kind of medium), and that the speed of light from classical electrodynamics had to be measured relative to the aether.

The Michelson-Morley experiment (1887) attempted to measure the changes in the apparent speed of light caused by Earth’s motion relative to the aether. It did not find anything, so the aether idea had to be discarded. Together with other experiments, it turned out that the speed of light measured by all observers is the same.

The situation was only resolved by Albert Einstein in 1905, when he worked out that actually, it was Newtonian mechanics that needed modification — special relativity. This has a number of consequences like length contraction, time dilation, or E = mc². (The source would then most likely go into details on this part.)

Now, this is not exactly wrong, but it’s a shortcut that compresses a lot under “and various other experiments“. This gets one quickly to special relativity, but leaves on trust that the constant-speed-of-light assumption is inevitable.

If you have a bit of a crank’s instinct to figure things out your own way, you can think of other explanations. One common one is aether wind: Maybe the Earth carries the aether along with itself, so we, standing on the Earth, cannot see any change in the speed of light.

Even if you don’t have that crank’s instinct, this shortcut can make it look like a sequence of experiments and ideas that came out of nowhere: why did Michelson and Morley decide to test the aether theory? How did Einstein figure out that the right thing was to revise concepts of space and time? Or contrarily, why did nobody else think of that for 18 years?

In fact, it does all make sense, but you have to dig through more details, because history is big. But things in history and in science are connected, so a lot of the detail we find here is also interesting on its own. We will have to start at the beginning...

Since at least the times of Descartes, there were competing theories of light: either as particles (held e.g. by Newton) or as waves in some universally present medium (held e.g. by Huygens).

These theories both had a lot of room for specifics to fit what was empirically known, and so they could for example both explain Snell’s law of refraction:

Light going from a less optically dense medium (e.g. air) to an optically denser medium (e.g. water) bends towards the normal.

The empirical fact is that each material has a refractive index n and light, when moving from a low-n environment into a high-n one, bends towards the normal according to the formula:

$n_1 \sin \theta_1 = n_2 \sin \theta_2$

The wave-theoretical explanation is that waves move slower in a high-n medium, so their spatial frequency k (how close together the peaks of the light wave occur — also called the wave vector) is higher — proportional to n.3 For the waves in both mediums to match up at the boundary, the frequency along the boundary cannot change, so the extra frequency has to go in the normal direction and the wave bends. From the trigonometry you get Snell’s law:

This took me way too much time to draw and Inkscape crashed multiple times. :(

Corpuscular theory has the exact same trigonometric construction, except instead of light slowing down with higher n, it instead speeds up, and k is not a wave vector but simply the velocity of the light particles. As for why the speedup happens along the normal, one could imagine high-n materials attracting light particles at a very short range (that is, act as potential wells).4

One of the phenomena that could distinguish between the two theories was the speed of light: with a corpuscular theory, it would make sense for the light e.g. from different stars to have different speeds, while in a wave theory the source should play no role. Light particles that move more slowly would correspondingly bend more strongly, because the change of speed between media would affect them more.5

This was the line of thought followed by François Arago6 in an 1810 experiment, in which he used refraction at a prism. He hoped that by detecting starlight that bent more or less strongly, he could distinguish between stars with different light speed, and this might be a way to find the mass of stars — heavy stars would exert a stronger gravitational pull on their escaping light, so their light would be slower.

But he didn’t find any differences. Not only between stars, but not even between different times of year, when the Earth moves towards or away from a given star, and so one would expect a different relative speed.

There was the phenomenon of stellar aberration, in which the relative speed of the Earth and a star affected its (apparent) position in the sky by a small but measurable amount. But nobody could detect any change in the light speed.

Arago guessed that there might be a confounder: maybe light of different speeds existed, but humans could only see light in a narrow speed range. This was actually not completely off track, since it also seemed to explain the recently discovered calorific and chemical rays (which we today know to be infrared and ultraviolet light respectively, and we know that the relevant quantity for human vision is frequency, not speed).

But this problem also convinced him to consider the wave theory, which could explain why there were no differences between stars of different mass. It could not immediately explain why there were no changes caused by the Earth’s motion (more precisely, its change over different times of the year), so Arago wrote to Augustin-Jean Fresnel, who found a solution in 1818.

The wave theory also doesn’t give one an explanation out of the box. If we assume the medium light propagates in (the aether) to be perfectly stationary, then the strength of refraction is also affected by the speed of the prism relative to the aether, so Snell’s law only holds in the rest frame of the aether.

Fresnel’s solution is straightforward if you draw the right diagram for a simplified version of Arago’s experiment: Assume you are not moving relative to the aether. If you shine light in at a right angle into a prism, you find it not refracted. Snell’s law holds. So far, so good.

The speed of light in a vacuum is typically denoted c, so let’s move to that notation. Also notice that I figured out how to use LaTeX in Inkscape.

But now move this entire system (the prism and the incoming light) to the right through the aether at some velocity v. Snell’s law predicts that light moves along the gray line, but the actual path will be the red one, which you get by transforming the old path through the Galileo transformation (i.e., adding x = vt to the light paths).

To explain light getting this extra unaccounted-for speed, Fresnel suggested partial aether drag: the aether, and the light within it, takes on some of the speed of the moving prism it enters.

All the angles and speeds here are using the approximation that v is not too big.

If w is the observed speed of light, c the speed of light in the aether, and n the index of refraction, then Fresnel’s partial drag formula says:

$w = \frac{c}{n} + \left(1 - \frac{1}{n^2}\right)v$

The coefficient standing before v is why the drag is partial.

This solution works for all experiments sensitive only to effects linear in v/c. Today, we know that it’s actually the relativistic velocity addition formula between c/n (of the light in the medium) and v (of the prism) if v is much smaller than c.

Soon, wave theory triumphed for other reasons. Once Fresnel presented his theory, Siméon Denis Poisson used it to calculate that in the center of the shadow of a circular object should be a bright spot. He declared that this nonsensical result proves wave theory wrong, but then Arago did the experiment and saw the spot appear — so it’s alternately called the Fresnel spot, Poisson spot, or Arago spot.

This is a case study of the problems with a priori reasoning.

A modern replication of Arago’s experiment with a coin used as the obstacle. On the right is the light beam with the coin, on the left is the observed image. The Fresnel/Poisson/Arago spot is clearly visible in the middle, and an Airy disk pattern around it is visible if you look closely. The light source here is a laser, which Arago didn’t have, but any point source works. (source video)

While there wasn’t any experimental evidence against aether drag, it was hard to understand where the partial term comes from theoretically. It could be that only a part of the aether dragged light along, or that all of it did but at varying efficiency, but there was no good reason for why this effect would depend on n. The dependence on n was especially weird in light of dispersion (n, and thus the drag strength, is different for light of different frequencies in the same material) and double refraction, where n also depends on polarization — so apparently the aether drags more or less based for different kinds of light.

Speaking of polarization: its correct theory got worked out by 1822 when Fresnel described both linear and circular polarization. For a wave theory of light, it meant that light is analogous not to pressure waves in a gas, but to transverse waves in a solid. This made the aether a strange kind of material: an omnipresent solid that didn’t seem to interact with matter except through light.

All of this was sufficiently counterintuitive that in 1845, George Stokes7 came up with a different theory: total aether drag. His starting idea was that since the aether has to support very fast vibrations (light) but not slow down planets, which move much slower, it might be a non-Newtonian fluid: fluid without viscosity at low speeds (such as the Earth’s), but solid at high speeds/frequencies. This would give Fresnel’s formula without the partial term:

This theory didn’t fit the evidence without additional assumptions about the aether’s behaviour (for example, Arago’s measurements would have seen total drag if it existed, and stellar aberration couldn’t happen), and even then it never explained anything Fresnel’s model didn’t, but it was still attractive because it offered a kind of mechanical explanation, however tortured.8 It finally died in 1886 when Lorentz showed that Stokes’s additional assumptions were mutually inconsistent. Even after that (between 1893 and 1897), Oliver Lodge and Ludwig Zehnder experimented with rotating heavy lead balls, looking for whether they affect light passing close around them, with no success.9

There were more experiments trying to detect the aether wind, all using light in a refractive medium (n > 1), all with results explainable by Fresnel’s partial drag (e.g. Hoek 1868, Airy 1871). Most convincing of them was the 1851 Fizeau experiment that directly measured the speed of light in moving water and air, getting exactly the results from Fresnel’s formula.

But note that Fizeau, at the end of his report, only accepts Fresnel’s formula for the speed of light as an empirical fact and writes that the theory as a whole may need extra verification, because “the conception [of aether drag] is so extraordinary”. This goes to show that aether drag was treated with suspicion and people were looking for a better explanation.

In 1864, Maxwell’s equations appeared with the speed of light as a constant c. They made the problem mathematically clear, but didn’t immediately lead to a solution, since it was still unclear what their c should be measured relative to.

It is only against this backdrop that the Michelson-Morley experiment results are properly shocking. The Michelson-Morley experiment was uniquely not dependent on refractive media slowing light down: it was sensitive to an effect scaling as (v/c)² and it was measured in air, which has n practically equal to 1, so it could measure the aether wind even in spite of Fresnel’s partial drag.

A sketch of the Michelson-Morley setup. The apparatus could measure the difference in time between two paths the light took (along the long arms), which would differ slightly if an “aether wind” was blowing. The whole thing could be rotated to measure the wind in different directions, and the experiment could be repeated at different times of the year when the Earth moves in different directions (relative to the Sun).

Michelson-Morley didn’t find any aether wind. The natural reaction would be Stokes drag: the Earth drags the aether along with itself completely, so there is never any relative velocity of any measuring apparatus and the aether that we could hope to measure (at least on the ground).

But in 1887, Stokes drag was already dead and buried, inconsistent with dozens of other experiments going back centuries (in the case of stellar aberration), and variations on it found mathematically impossible. There were attempts at finding some middle ground between complete drag and Fresnel’s partial drag, which were ultimately experimentally disproven in 1925, way after the birth of special relativity, but they were apparently not too convincing even before that, because they required the aether to have even more tortured properties.

All of this goes to say that though Michelson-Morley was a crucial experiment, it could only be one because of a web of supporting experiments that restricted alternative theories, and this web had to reach even comparably “dirty” topics like the mechanism of human vision.

Another point ties this to what we have learned with scurvy: With Stokes drag, we again see that work done on a completely wrong theory still provided useful information once a future breakthrough (Michelson-Morley) happened.

Learning this history clarified a few things for me:

On its own, detecting the Earth’s motion relative to the aether seems like a mere sanity check of the aether theory, but in the context of the detailed history it is an obvious thing to try to get another piece of the puzzle. The theory of light was in practically constant development, from the debates on waves versus particles in the time of Newton, through the discovery of polarization, Fresnel’s and Stokes’s competing aether drag hypotheses, mechanical aether conceptions, the unification by Maxwell and the Lorentz transformation making the problem mathematically crystal clear, and finally Michelson-Morley. It’s clear that every new way of looking at the problem was bound to be tried by someone.

We teach any scientific field using a small set of key discoveries that introduce the right ideas as quickly as possible, and you can follow along as long as you don’t go off wandering on garden paths. But the actual history also has a long tail of dead ends and supporting evidence that was necessary when the correct answer wasn’t yet known in advance.10 This is good to know if you’re just entering science: not everybody can be Einstein, but many people can be, say, Lodge, and tie off a loose end needed for the whole construction. It also shows that Einstein did not work in a complete vacuum and his contribution appeared at a time when many of the individual ideas were already out there — the correct answer was already quite constrained and thus conceivable.

(And as a bonus, the tail works well at disproving most crank theories.)

When I learned SR, the lecture very reasonably didn’t go into all the aether theories and experiments, because from a modern perspective they are dead ends and you can derive SR from just Michelson-Morley.11 But when you are looking forward instead of backward, you have to look at a big pile of results, choose one and put faith in it, even when it leads you to revise concepts of space and time. Further, aether theories provide an explanation of light in terms of mechanics, which is a much more inviting field: well-understood and conceptually accessible. By rejecting the aether, you reject this nice-to-have connection, getting a result that works but doesn’t give a comfortable interpretation why. Even after special relativity, Lorentz (in 1913) was unsatisfied and hoped to restore the aether one day.

Between 1887 and 1905 there was a lot of development and partial results (by Cohn, Wien, Lorentz, FitzGerald, and Poincaré) that found many of the special-relativistic effects we know today — length contraction and time dilation, mass increase at high speeds (at least for charged particles), Lorentz transformations... But all of this was derived as effects within the mechanistic idea of the aether, making a lot of distinctions between “true” and “apparent” values, which makes the derivations monstrously complicated and inaccessible.12

Einstein cut through that and found the way to make all of the results come naturally from simple postulates, at the cost of the analogy between light and mechanics being lost. This way he also got some of the more subtle corollaries (including E = mc²) first.

What Einstein certainly discovered first is the concept of a mathematical symmetry (here, Lorentz invariance) being the starting point for creating a physical theory, before any electromagnetism or mechanics or similar “material” consideration. This started an era of about seventy years when theoretical physics was exceptionally productive.

A big related result behind the *a priori* period of physics is Noether’s theorem, which says there is a one-to-one correspondence between conserved quantities and mathematical symmetries of the Lagrangian defining the underlying theory. (photo source: Emmy Noether has a street named after her in Göttingen.)

While special relativity was found thanks to experimental bludgeoning until somebody finally got the right idea, general relativity was built almost entirely a priori, only interacting with experiments after the theory was completed. Albert Einstein built it from physical basic principles (of equivalence and general covariance); almost in parallel, David Hilbert found the same equations by a mathematical approach, essentially as the simplest possible geometric theory of gravity arising from a variational principle.13 In 1915, when the theory came out, the only piece of experimental evidence it had was the precession of the perihelion of Mercury, and the process of its experimental confirmation in various regimes took up decades afterwards. One of Einstein’s originally proposed tests, measuring the gravitational redshift of light, was only done in 1960. So far, all of the tests have come out in its favour.

This is a different regime of science than scurvy! The main reason any of this worked was that there were strong mathematical constraints on what a possible fundamental theory of physics could look like, so there wasn’t room for sweeping contrary evidence under the rug as a confounder. This theme of mathematical constraints was then carried much further using almost solely a priori theoretical considerations.

Geometric symmetries of space are already quite constricting. If you want a physical law with no preferred direction, it must be written in vector form, and the simplest possible equation in that form involving scalar quantities (the Poisson equation) turns up in a number of unrelated areas: electrostatics, the steady state of heat flow or diffusion, the shape of a stretched membrane... There are a couple other equations, but they all involve a single application of the Laplacian operator.14

But the biggest triumph of a priori happened in particle physics.

The Economist produced this chart for its article on the experimental discovery of the Higgs boson. Note that in all cases but two, theory comes before experiment, and the theoretical predictions are all finished by the mid-1970s. (source)

Special relativity forces the symmetry of Lorentz invariance on any theory, and when trying to make a quantum theory of electromagnetism that would satisfy this, Dirac found it necessary for a positive-charge version of the electron to exist, and thus predicted the positron in 1931. One would be found in cloud chambers within a year. It had actually been observed before 1931, but not understood correctly, and new theory made sense of the experiments, rather than vice versa as we have seen before.

As new particles followed, the unambiguous precedence of theory before experiment became the standard. The neutrino was proposed in 1930 by Pauli as a way to maintain conservation of energy (which, by Noether’s theorem, also follows from a symmetry) in beta decays and found in 1956.

Apart from symmetries, there was another important mathematically constraining factor, nicely introduced by continuing the history of light.

It’s hard to tell when the question “what is light?” finally got answered. The unification with electrodynamics would put that year at 1864, but the question of the aether seems integral to the problem, so perhaps the better year would be 1905 with special relativity. But, motivated by the wave-particle question of Newton’s times, one can follow the theory of light even further, through the photoelectric effect and ultraviolet catastrophe, and arrive at quantum mechanics. Then the next answer would be somewhere around 1950 with the work of Feynman, Schwinger, and Tomonaga on quantum electrodynamics, as can be seen e.g. in this paper by Feynman, or 1967 for the final version of the electroweak theory as it appears today in the Standard Model.

The 1950 paper’s abstract ends with an ominous note: “Problems of the divergences of electrodynamics are not discussed.“ This refers to the problem that calculations in quantum field theories produce diverging sums, which give nonsensical infinite results. The way around this problem is a procedure called renormalization. This is also where the material gets above my technical understanding, so I’ll just send the interested reader to a paper by John Baez (especially section 4) with a reasonably gentle explanation of both the problem and the solution.

The main point for us is that only some theories can be renormalised. Starting with a theory of the weak force, to get a renormalisable theory one needed to add several additional particles, which is how the W^± and Z⁰ bosons were predicted (1968, Weinberg, Salam, Glashow). Explaining how those bosons got mass required adding another extra particle, the Higgs boson. By 1971, Gerard ‘t Hooft proved that this theory as a whole was renormalisable, leaving it as it stands today in the Standard Model of particle physics. The SM was completed in 1973 with the discovery of asymptotic freedom in the theory of the strong interaction.

All that being said, the listed arguments are rarely completely a priori. Rather, they work well enough that the most obvious construction turns out to be correct, and retrospectively they may be sharpened with the knowledge of the desired (experimentally verified) solution.

There are loopholes listed at the end of the (footnoted) 3+1 paper: once you relax other nonobvious assumptions, like pointlike fundamental particles or locality, you get an explosion of more outlandish theories you would in principle have to refute one by one.
The Higgs mechanism was proposed in 1964, but only several years later was it shown that it is renormalisable — rather than starting from the demand of renormalisability and ending necessarily at the Higgs mechanism.
Even the principle of equivalence in general relativity does not hold without some subtlety. As my GR textbook says on the matter:
- [Because the electromagnetic wave equation in general relativity turns out not to satisfy the equivalence principle, which we have used to initially build the theory of GR], after all, the GR equations are not always ensuing clearly and uniquely from fundamental principles, even with the principle of minimal coupling added. A plethora of papers exist on the logical structure of GR (not speaking about still more difficult underlying layers concerning the “ontological” nature of the metric), where these issues are addressed, possibly together with suggestions how to supplement the principles in order to make the transfer to curved-stage physics completely axiomatized. On the other hand, having experienced complications of such efforts, many respected authors finally admit that one has to live with that (with the non-uniqueness), trying to resolve the remaining queries via physical insight and — ultimately — via experiment. Regarding the almost miraculous strength and richness of general relativity and of the Einstein equations in particular, the classical bible MTW nicely comments on the above issues, in §17.5., by saying, casually:
  - “In the beginning axioms told what equation is acceptable. By now the equation tells what axioms are acceptable.“

This is why experimental confirmation was still considered necessary, leading to the period of 40 years when experiments were catching up while theory sat still. This culminated in 2013 with the Large Hadron Collider finding the Higgs boson. And while that was treated as a big discovery (Science termed it Breakthrough of the Year 2012 (a bit ahead of time as CERN wasn’t yet certain it had truly been the Higgs boson) and Peter Higgs finally got the 2013 Nobel prize for his 1964 theoretical work), in a way it wasn’t, because it was the least surprising possible result of the search.

Which brings us to today.

To me, the monster and its absurd size is a nice reminder that fundamental objects are not necessarily simple. The universe doesn’t really care if its final answers look clean. They are what they are by logical necessity, with no concern over how easily we’ll be able to understand them.
— 3blue1brown, Group theory, abstraction, and the 196,883-dimensional monster

Sometimes our little human splashings are not enough.
However hard we try, however strong our heroic human wills (and us humans have such a capacity, such a heroic capacity for believing that the impossible might be possible), sometimes our ridiculously puny human arms are too weak. Sometimes the world is just too big for us, the hurricane too wild, the sea so huge, that it wears out even the bravest of hearts, the strongest of wills.
— How to Seize a Dragon’s Jewel, by Cressida Cowell

The logical culmination of all this work would be a Theory of Everything, ideally one that starts from mathematical constraints (some kind of causality, some kind of stability, something like Cogito) and finds that these uniquely determine a set of physical laws. This idea has been recently championed by cosmologist Max Tegmark (author of the 3+1 dimensions paper), who calls it the Mathematical Universe Hypothesis — that the physical universe is not just well understood using mathematics as a tool, but it actually is some specific mathematical structure.

Alas, this has been elusive. There are multiple problems this program has hit.

(Unfortunately, this part has to be somewhat handwavy, unsatisfactory, and full of external links, because a) there are unsolved problems on both the theoretical and empirical side and b) problems get arbitrarily complicated as one approaches the cutting edge.)

The chief problem is that the theories of gravity and quantum mechanics can’t be easily made compatible — specifically, trying to include gravity as a quantum field theory (like the other forces) doesn’t work because it’s nonrenormalisable.

But more interestingly for us, some parts of the Standard Model might not be actually a priori necessary. When one digs into papers about why there are three generations of particles (i.e., the electron, muon, and tauon, and three pairs of quarks, when only the electron and the first quark pair are involved in most “everyday” processes), one enters a big pre-understanding speculative mess of dark matter and leptogenesis. A universe with no weak force would look a lot like our own, with the same chemistry and mechanics. It is an open question whether stars could form and be powered without beta decay, and the nested hypotheticals quickly become intractable with no way to experimentally remove the weak force, but that also means that it would be hard to figure this out a priori.

On the empirical side (on top of the dark matter problem), there are two competing models of neutrino oscillations (the only confirmed experimental particle physics discovery beyond the Standard Model) and cosmological models of baryogenesis (why there is much more matter than antimatter) haven’t quite enumerated all their variables yet.

On the theoretical side, the math got overall much more difficult. Einstein could work with Riemannian geometry, already decades old when he needed it, while today... Well: If we follow the history of light theme, it turns out that quantum electrodynamics has some mathematical issues (key word: Landau pole) that make it ill-defined at extremely small length scales (see a paper by John Baez, section 4 here). The scale is absurdly small: about 10^-294 meters, which with a slight abuse of SI prefixes can be written as 1 yqqqqqqqqqm, and in any case is a regime where the theory isn’t applicable for physical reasons.15 But it’s hard to prove things mathematically about a theory that is not rigorously defined. The Standard Model might have a similar problem. So, in the year 2000, the Clay Mathematics Institute issued a $1,000,000 reward for a resolution of this problem in a greatly simplified model, and the prize has not been awarded yet.

Another issue is that both GR and the Standard Model still have a few free parameters whose values have to be set by experiment (for GR, it’s two (the Einstein gravitational constant and the cosmological constant), while for the Standard Model it’s 19 plus possibly more for the neutrino oscillations), and the numbers found so far seem oddly chosen (e.g. one of them seems to be arbitrarily zero even though it could be there in principle).

One other thing that should be at least mentioned is string theory, which has the promise of having no free parameters, and thus to one day produce a Theory of Everything completely constrained by the math.16 Here we go way over my technical knowledge, so I will limit myself to a single observation (for which I am mostly indebted to this article by John Psmith (?)):

The mathematics behind string theory is connected to the monstrous moonshine theorem. Essentially: Once upon a time, there was a giant collaborative mathematical project to classify the finite simple groups — i.e., possible “elementary” symmetries (in some precise sense). This produced 18 infinite classes of those groups and 26 sporadic groups that didn’t fit into any of the regular classes. The largest of those groups has been called the Monster group,17 because it’s just a monstrously big object representing the symmetries of an object that lives in 196,883-dimensional space. And yet in some sense (which I am not fully clear on) it is fundamental, the only group satisfying a not-too-hard-to-specify property among all the groups — some of the most fundamental mathematical objects in the first place. The monstrous moonshine theorem is then about some connection between the Monster and elliptic curves (which calls to mind again the strange connections in math we’ve seen!), and crucially, it was proven using a theorem originally from string theory.

My point is, very impressionistically, it would make a certain kind of sense for the world to be based on something like the monster group, the most complicated unique object there can possibly be in an important area of mathematics. Such a world could indeed be unique by mathematical necessity and yet be extremely complex.

I have searched around and found no reasonable illustrations for either string theory or any of the abstract topics here. This illustration of the Monster group used by 3blue1brown seems apt.

It has become common among physicists to complain about string theory’s long stretch of funding with no material outcomes. But we do not choose the world we live in, and we do not know the magnitude of the challenge ahead of us ahead of time. (I don’t know much about the institutional side of things, except that I’ve never met a string theorist in my life, so probably they are less common now than they used to be.)

Currently, string theory doesn’t work, in part because it isn’t really unique — it doesn’t have free parameters, but instead a huge collection of possible false vacua giving different “effective” laws. The number of such possibilities has been cited as 10⁵⁰⁰–10^272,000, which is both a major obstacle to getting predictions out of string theory and itself suggestive of its generally unfinished state.

There are conceivable experiments to decide between quantum gravity theories, but the ones that would be guaranteed to give new information are absurdly difficult (hence the second quote opening this section). One such test would be directly detecting a graviton, but the paper titled simply Can Gravitons Be Detected?18 starts by considering a 100% efficient detector the size of Jupiter — it would not be enough, but making many copies of such detectors in different solar systems might allow one to get enough statistics, but one would also need to somehow shield those detectors from neutrinos, which would require light-years of shielding material (which would collapse into a black hole, so you would have to find some other way...) — and concludes:

Certainly, if a “no graviton” law appears elusive, we do feel entitled to predict that no one will ever detect one in our universe.

There are a few other possible ways forward. On the experimental long tail are things like quantum Cavendish experiments checking the marginal possibility that gravity might not be quantum at all, various searches for dark matter, neutrino detectors to decide between the two neutrino oscillation models, cosmological observations... On the theoretical side are things like the Clay million dollar problem or the swampland — trying to find worlds that string theory cannot describe, to get it to say at least something empirically meaningful. People will keep trying. But we are back at the old discovery mode, at trying out every approach not yet tried and hoping the world gives us a gift.

If the Mathematical Universe Hypothesis were to be true, it would open a whole separate can of worms about what exactly mathematics is. Currently, discussions about which mathematical structures exist / make sense to discuss / are well-defined often terminate in arguments that are physical in nature (about what a sufficiently long-lived mathematician could interact with), and getting rid of this circle is a subtle problem. This will be a new direction we will follow for a couple posts at some point in the near future.

Relativity and quantum mechanics are often discussed as examples of paradigm shifts or scientific revolutions in the work of Thomas Kuhn. I might cover that at some point in the future, although this part is not written yet, because I find Kuhn’s writing style reader-hostile — already from the first sentence:

History, if viewed as a repository for more than anecdote or chronology, could produce a decisive transformation in the image of science by which we are now possessed.

In the beginning, I used cranks as motivation for looking into the long tail of theories and experiments, but of course few actual cranks would be convinced by this post. Crankery is an interesting phenomenon in that crank theories make wild connections between far-away theories, which is the mechanism I have proposed for making progress, and they promise great progress, but nothing comes of it. This can be phrased more formally as the demarcation problem, and it was a big part of the motivation of Karl Popper’s falsifiability criterion. Thus, the next post treats the demarcation problem and falsifiability in detail (continuing the thread of existing theories of science). The SR half of today’s post was originally an example within that section, so we will find part of the groundwork already laid.

This post grew beyond my initial expectations as the reorganisation from my existing notes prompted me to add all the post-SR stuff, much of which is new. This delayed it by a week. Since I anticipate this happening to a greater or lesser degree in the future as well, the Hydra is moving to a biweekly (that is, fortnightly :D) schedule. The next post will thus be scheduled for 4.2.2026 and be normal size, not monstrous size (I swear).

Discussion about this post

Ready for more?