The proof that π = 4

In an article published here last week, I discussed the perils of thinking about infinity as a number. More specifically, I criticized the structure of some of the elementary proofs that 0.9999… = 1.

As a teaching prop, I wheeled out the following equation:

\(\underbrace{1 - 1}_{= \ 0} + \underbrace{1 - 1}_{= \ 0} + \underbrace{1 - 1}_{= \ 0} \ \ \ldots = 0\)

This is an endless sum of alternating +1 and -1 terms. Pairwise, they all work out to zero, so the equation seems to make sense.

At the same time, there’s no risk of running out of terms in an infinite sum, so it seems harmless to shift the annotations one position to the right:

\(1 \ \ \underbrace{-1 + 1}_{= \ 0} \ \ \underbrace{-1 + 1}_{= \ 0}\ \ \underbrace{-1 + 1}_{= \ 0} \ \ \ldots = 0\)

This seems to be saying that 1 = 0. Oops.

The reason I like this “proof” is that it’s hard to reflexively dismiss. A common reaction is “oh yeah, but this left an unpaired - 1 at infinity”, but what does this mean? If there’s a single, specific element at the ∞-th position in the sum, what do we find at position ∞ + 1?…

In the earlier article, we concluded that in contexts like these, infinity must be understood as a process metaphor, not a number. We’re not talking about an infinite number of steps as much as we’re talking about the outcome of an ill-specified number of steps.

Well, sort of. Thinking of infinity as a process helps us make sense of a fair chunk of higher math, but it’s not always enough. Sometimes, it’s easy to fixate on the notion of infinity and miss more basic flaws in our reasoning.

Consider the following troll proof that π = 4:

We begin by drawing a circle with a diameter of 1; it follows that the circumference of this circle is 1π. We then draw a 1×1 square around the circle. The perimeter of the square is a sum of the lengths of its sides: 1 + 1 + 1 + 1 = 4.

Next, we “fold back” small sections near the corners of the square. We trim and reorient segments of length a such that the inverted corner just touches the circumference of the circle. Critically, this operation doesn’t change the perimeter of the outer shape. This should be fairly clear, but we can also double-check the result: each of the remaining long edges has a length of b = 1 - 2a; the newly-added corner sections are 8a in total. The sum of 4b + 8a works out back to 4 regardless of the exact values of a and b.

Yet, the outer shape is now evidently a better approximation of the circle. If we perform another iteration, folding back the eight protruding corners, we get even closer to a circle without changing the perimeter in any way. If we keep doing this forever, the seemingly inescapable conclusion is that we’ll get infinitely close to the shape of a unit circle while keeping the perimeter of 4. In other words, the circle’s circumference must be also equal to 4. Or, to put it more bluntly: π = 4.

The “proof” is fascinating in part because it’s multilayered; it trips up both novices and people who are quite conversant in math. For example, many popular YouTube videos offer explanations that are unsatisfying, incomplete, or outright wrong.

If you ask on a math forum, you can simultaneously get closer to and farther away from the truth. The usual response is something along the lines of:

I get it: mathematics doesn’t concern itself with intuition or reality. It operates in a closed universe of axioms; the main thrust of the discipline is to make these axioms as abstract as possible, and specify them as precisely as possible. So, if you don’t want to learn the lingo of mathematical analysis, what are you doing here?

At the same time, you might be one of these entitled bozos who just want to know why the π = 4 proof is wrong. If so, to peel off the first layer, don’t get distracted by the part about infinity. We start by distilling the troll proof to a simpler but functionally identical case — trying to find the length of the diagonal of a 1×1 square:

We have a diagonal of some unknown length. We make the first approximation with a path consisting of a single horizontal segment and a single vertical segment (arrows, left). The overall length of this path is 1 + 1 = 2.

Next, similarly to the circle scenario, we fold back the corner where the two segments intersect. This seemingly gives us four identical sections, each half as long as before (middle diagram); the overall length of the stairstep path appears to be the same as before. We keep going; the shape gets closer and closer to the diagonal, but the walking distance along the jagged path evidently doesn’t budge. As before, the conclusion is that the diagonal has a length of 2, rather than the ~1.41 value you can measure with a ruler or calculate from the Pythagorean theorem.

So, what’s wrong with these proofs? Some internet commenters believe that the resulting shape “never” gets close to what it purports to approximate. If so, the first thing we should confirm is that the construction process actually works the way the troll proof claims it does.

It helps to develop some well-defined metric for that. Most simply, we can analyze the pointwise distance between the stairstep approximation and the diagonal. The following diagram should help:

On the left, I marked the peak distance between the diagonal and the initial approximation; this is labeled x; we could solve it using Pythagorean theorem, but we don’t really need to.

If we look at the rotated view in the lower part of the figure, the actual distance changes linearly from 0 to x and back to zero. Because the ramp is linear, the average distance between the target shape (the diagonal) and the first approximation is equal to one half the maximum. I’m going to invent a symbol for this error and write ε_shape= x/2.

In the center panel, the situation repeats for the next approximation: we have two triangles that are precisely half the size of the earlier one. Within the span of each of these triangles, peak deviation is x/2, so the average ε_shape= x/4. Finally, after one more iteration (right), we get ε_shape= x/8.

The deviation remaining after iteration c can be generalized as:

\(\varepsilon_{shape} = \frac{x}{2^c}\)

Again, in this equation, x is just some positive constant that we couldn’t be bothered to calculate. Either way, the value can be made arbitrarily close to zero by increasing the number of iterations. So, the troll shape approximation algorithm looks just fine: on a pointwise basis, the shape converges on what it is supposed to converge on.

If the method of constructing the approximation is correct, perhaps we’re mistaken about the length of the constructed curve? It doesn’t feel that way, but once again, it’s best to have a firm metric in place:

On the left, we have a diagonal of some length n and a two-segment path (total length 2). The resulting path error — the walking-distance difference between the two routes — is ε_path= 2 - n.

Next, let’s have a look at the middle diagram. Here, the length of the diagonal is obviously the same as before (n), while the stairstep curve has a length of 4 · ½; this yields ε_path= 4 · ½ - n — no change from before. The situation repeats on the right: ε_path= 8 · ¼ - n. The general formula for the error after c steps is:

\(\varepsilon_{path} = \frac{2^c}{2^{c-1}} - n = 2 - n\)

That’s to say, ε_pathappears independent of ε_shape; it remains constant (and pretty big) as we iterate.

So, what’s actually going on? Well, we can observe that the diagonal is smooth while the stairstep approximation gets increasingly jagged. In each iteration, the size of each “detour” is halved, but the number of detours doubles.

Once again, the core claim of the troll proof checks out! The problem isn’t the math: it’s that the proof implies a contradiction where none exists. We ought to ask if pointwise proximity and walking-path distance must be correlated to begin with. After all, you can probably take many routes from home to work or school that are geometrically distant, but have similar lengths. Conversely, two nearby walking paths can have vastly different lengths if one is straight as an arrow and the other zig-zags a lot.

Yes and no. That’s where the proof trips up many folks who are more proficient in math. For a finite (but arbitrarily large) number of iterations, the answer is a firm yes: despite visual similarity, jagged circles and smooth circles are two wholly separate things. Making these jaggies small doesn’t make them disappear.

But if we take the “repeat forever” part of the troll proof literally, we enter the realm of mathematical fiction where the answer can change. In standard analysis — the prevailing flavor of fiction used to deal with infinity in algebraic contexts — attempts to formally analyze the scenario will show that our increasingly jagged curve somehow collapses to a smooth diagonal (or a smooth circle) the moment we start talking about the hypothetical outcome “at infinity”. In this view, the troll proof is incorrect by assuming that the outcome of an infinite process must bear some resemblance to what we see after a finite number of steps.

The best way to develop intuition about this result is to have another look at the earlier formula for the pointwise error between the stairstep pattern and the diagonal:

\(\varepsilon_{shape} = \frac{x}{2^c}\)

We’d be forgiven to say that as c (the iteration count) tends to infinity, the value of ε_shape becomes infinitely small. It’s not wrong, but this kind of talk is verboten: as outlined in the earlier article, infinitesimals have no place on the real number line!

In essence, real numbers must obey the Archimedean property, which is roughly that for every positive real number a < b, multiplying it by a finite integer should allow you to flip the inequality (a · n > b). Infinitesimal numbers, which could be very loosely visualized as fractions with “infinity in the denominator”, can’t obey this property. If we allowed them in reals, it would cause a wide range of thorny algebraic issues that we want to avoid.

This means that in standard mathematical discourse, “infinitely close to zero” and “equal to zero” are effectively the same, so the limit of ε_shape is zero. And if ε_shape = 0, then we must conclude that the two figures no longer differ in any way. This also implies that “at infinity” — and not a moment sooner — the length of the constructed curve must jump from 2 to √2 (in the case of a diagonal), or from 4 or π (in the case of a circle).

This statement may sound weird, but there’s nothing that prohibits such a result. Keep in mind that infinity is not a finite number: it’s an abstraction for a place as distant from real numbers as you can get. Discontinuities can happen as we take a leap from “here” to “there”.

The apparent collapse of our infinitely-jagged shaped doesn’t have any profound meaning; it’s just an outcome of an thought experiment in a framework where numbers must be finite, but processes can continue without end. This asymmetry can produce wacky results elsewhere, too; the earlier case of 0.9999… = 1 is another manifestation of the same phenomenon.

If we’re in a philosophical mood, we could insist that the geometric fine structure of the curve survives, just becomes too small to ever exert any influence on real numbers. That’s not just grasping at straws: there are nonstandard analysis approaches that allow infinitesimals and that would keep the two curves distinguishable, for some definitions of infinity.

👉 For more articles about math, visit this page. In particular, you might enjoy:

I write well-researched, original articles about geek culture, electronic circuit design, algorithms, and more. If you like the content, please subscribe.

The proof that π = 4

Discussion about this post

Ready for more?