OK, I reread that classic paper by Paul Meehl, and . . .

7 min read Original article ↗

. . . it makes a really important point that I hadn’t even noticed.

The article in question is Theory-testing in psychology and physics: A methodological paradox and it’s from 1967.

It’s entirely my fault that I missed the point, as it’s in the very first paragraph of the paper, and Meehl even helpfully puts it in italics:

In the physical sciences, the usual result of an improvement in experimental design, instrumentation, or numerical mass of data, is to increase the difficulty of the “observational hurdle” which the physical theory of interest must successfully surmount; whereas, in psychology and some of the allied behavior sciences, the usual effect of such improvement in experimental precision is to provide an easier hurdle for the theory to surmount.

He continues:

Hence what we would normally think of as improvements in our experimental method tend (when predictions materialize) to yield stronger corroboration of the theory in physics, since to remain unrefuted the theory must have survived a more difficult test; by contrast, such experimental improvement in psychology typically results in a weaker corroboration of the theory, since it has now been required to survive a more lenient test.

This post will proceed as follows. First, I’ll explain Meehl’s point and say why I think it’s important. Second, I’ll discuss how it is that I’ve read this famous paper so many times and never noticed its key message. Third, I’ll consider what we should do in social science, given this new understanding.

1. The message of Meehl’s paper

Here’s the key idea. In physics, the model you’re interested in is the null hypothesis. Get enough data and you can reject it. For a simple example, start with Copernicus’s model of the planets going in circles around the sun. Gather enough data and you can reject that model, replacing it with Kepler’s model of elliptical orbits. Gather more data and you can reject that model–but, hey! you can fix the model by hypothesizing another planet–Uranus. Then Neptune. Gather more data and you can reject the elliptical paths too. Now we have general relativity, and I don’t think it’s been rejected yet. Similarly with models in particle physics, solid state physics, etc. Gather enough data and you’ll reject your model, and that’s where you learn something.

In psychology, it’s the opposite. The model you’re interested in is that two variables are related to each other: do more of X and you get more of Y. The null hypothesis–that X is independent of Y–that’s not very interesting. I mean, sure, it would be interesting if it’s true, but typically it’s not, and you know it’s not. Gather enough data and you can reject the null.

We know that.

But here’s the interesting point made by Meehl in his 1967 paper: the process of hypothesis testing in psychology is the opposite of that in physics. In physics, as you gather more and more data, you put your model more and more to the test, and eventually it breaks down. This is also how I’ve always framed things in Bayesian data analysis: The point of any statistical model is to be there and do its job long enough for it to be replaced. A model is like a car that you drive until it runs out of gas, then you give it some more gas, you fix it when it breaks down, and eventually it’s more effort to fix than to just build or buy a new car. This is the Lakatos philosophy: it’s Lakatos’s version of Popper.

But in psychology, when you gather more data, it’s easier and easier to reject the null, which makes it easier and easier to confirm your favored hypothesis. All you learn by rejecting the null hypothesis is that you now have enough data to estimate a bigger model. That’s not nothing, but it’s not falsification in the Popperian sense, or a motivation for improvement in the Lakatosian sense. A psychology researcher proceeding by successfully rejecting the straw-man null is not working within the falsificationist paradigm.

It’s funny because the math for hypothesis testing in physics is the same as in psychology, but the interpretation is the opposite.

This is a big deal. Meehl wrote it in 1967, and his paper got lots of attention (according to Google, it’s only been cited 1500 times, but that’s a lot for a paper published so long ago; also, papers on philosophy don’t get cited that much, compared to papers on methods), but I don’t think most people got the point.

2. How did I miss the message?

Indeed, until I reread the paper the other day (in preparation for including it the readings for Week 13 of my Rationalizing the World course), I didn’t get it either.

Maybe one reason is that I’m trained in physics and I read Jaynes, so I’d already internalized the use-the-model-you-love-and-put-it-to-the-test-and-when-it-finally-fails-you-can-rejoice-and-think-about-how-to-do-better philosophy (I call it “model checking” rather than “hypothesis testing”). Another reason is that I already knew that Meehl, like me, opposed null hypothesis significance testing (unlike me, he was Saul Bellow’s therapist, but that’s another story), and so every time I’d read that paper, I’d just kinda skimmed it and seen that he was attacking the use of hypothesis tests in psychology research.

I’d understood that Meehl was criticizing null hypothesis significance testing as being confirmationist rather than falsificationist, but what I hadn’t caught was his comparison with physics.

Meehl’s point is not just the commonplace that social sciences have physics envy, that they’re unrealistically looking for law-like relationships that you can’t hope to find when studying human behavior or society; rather, he’s saying that the entire endeavor of data-based science changes if you move from precise, physics-type models to more vague, directional, social-science hypotheses.

3. What to do?

What’s the answer here? The answer is not to tell psychologists (or political scientists, or economists, or sociologists) to form strong, physics-like models and test them. Whatever strong, physics-like models we create, from the median voter theorem to prospect theory to social network models, are gross oversimplifications: they’re guides to thinking that we don’t expect to fit reality.

Rather, I think we need to act like statisticians (ok, maybe you could see that coming!) and consider model building to be an ongoing process. We don’t need to waste our time rejecting straw-man null hypotheses, and we certainly shouldn’t be comparing models based on their p-values or other measures of distance of data from uninteresting nulls; rather, we should build the best models we can, knowing that they’re imperfect, and gathering more data until we can do better.

This is not a falsificationist paradigm, nor is it confirmationist. It’s more like “normal science” in the Kuhnian sense. Or, to put it another way, we will be doing small falsifications every step of the way, occasionally slipping into a new paradigm when the old models get too damn clunky. It’s the fractal nature of scientific revolutions.

And that’s the final reason I didn’t catch Meehl’s point on my first (or second, or third) reading of his article. He identifies the problem without offering much of a solution. That’s fine–identifying the problem is the first step. I’m glad I read it again, this time actually reading that first paragraph carefully, as it deserved to be read.