The paradox of derivatives and integrals

Here’s the paradox: Analytically, derivatives are easier than integrals. You can take a messy function and differentiate using the chain rule, no problem. But lots of simple expressions have no analytical integrals. They have to define new things like Bessel functions just because of all indefinite integrals that can’t be evaluated in closed form.

So, fine. Differentiation is easier than integration. Everybody knows that; it’s why when you learn calculus they teach you derivatives first and integrals second.

But . . . computationally, it’s the other way. Computationally, derivatives are unstable and integrals are stable. The derivative is the slope of a curve, and if the curve is noisy (as can occur with computed quantities), the derivative will amplify that noise. Differentiation is a high-pass filter. In contract, integrals are stable: you’re averaging a bunch of little pieces, and even if they’re variable, the average can be smooth.

To put it another way, derivatives are differences; integrals are sums. Differences are less stable than sums. Just think of examples like 379124.32948293 – 379124.32948287, where you need to keep a huge number of digits in the original values in order to calculate the difference with any precision.

OK, in practice that doesn’t happen when we compute derivatives, but that’s because we use analytical differentiation (nowadays, we’ll use autodiff). Or if we compute numerical derivatives, we still do some smoothing or local approximation. Noisy data can be stably integrated but can’t be stably differentiated.

From that perspective, we’re damn lucky that derivatives are easy to compute analytically. Conversely, it’s not such a problem that we can’t in general do analytic integrals, because, hey, we can just evaluate them numerically, no problem.

So is that it–we’re just lucky? I feel like there’s something deeper than that going on.

And it’s not just an issue in math and statistics. The same concerns arise in economics, say. If you have causal identification, it’s easy to estimate aggregates–in econometrics jargon, “average treatment effects.” But it’s hard to estimate effects on the margin (where I’m using the term “marginal” in its economics sense, not its statistics sense). The trouble is, you only have a little bit of data on the knife edge that is the margin. To estimate a marginal effect at all, you need to do some modeling–that is, some averaging. That’s not an issue when you’re estimating an aggregate effect, because there you want to average. An aggregate effect is a sum, an integral. A marginal effect is a difference, a derivative, and it’s inherently harder to estimate.

Again, to estimate that differential effect, you need some modeling, and from there we can take advantage of the ease of analytical differentiation.

So there you have it. An interesting paradox, I think.

P.S. People offer some good thoughts on this in the comments.