We are in the era of Science Slop

7 min read Original article ↗
Created with irony by Sora

It feels like there’s been a sea change over the last six months. Most of my colleagues now use LLMs regularly— for bouncing around ideas, proofreading, even doing calculations. A number of high-profile examples have broken into social media: Scott Aaronson credited GPT-5 with a student-level insight in a recent proof. Tim Gowers wrote that he’s crossed a threshold where LLMs save him more time than they waste, Terence Tao has been all in for AI assisted maths for some time. AI is clearly going to make a lot of great scientists even greater. But it’s also going to bury many genuine breakthroughs under a torrent of slop.

It’s interesting that so far, it’s mostly been mathematicians who’ve been at the front of embracing AI’s help. I suspect this is partly because there is the promise of being able to verify AI output using formal proof systems like Lean—if it type-checks, it’s correct. Physicists have been slower to claim progress.

Which brings me to a paper that’s been making the rounds. My colleague Steve Hsu recently announced on X that he’s “published the first research article in theoretical physics in which the main idea came from a LLM —GPT-5 in this case.” Greg Brockman of OpenAI promptly amplified this, and the story has been picked up as a milestone for AI in science, as part OpenAI’s broader push.

There’s just one problem: the paper answers a question whose answer we’ve known for 35 years.

UPDATE: Upon closer inspection, there’s a more significant problem. Turns out the AI didn’t just reinvent the wheel — it also pointed it in the wrong direction. I’ve updated the post with italics to reflect this. (Dec 7, 2025, 11:35 BST) and posted a manuscript in the arXiv.

The paper, which is published in Physics Letters B, claims to determine whether nonlinear modifications to quantum field theory can be made compatible with special relativity, and it does so in a very complicated way. And while I have no reason to doubt the actual math, I’m pretty confident that Steve published this as an example of what an AI could do, rather than as an example of interesting physics. Which is what makes this a cautionary tale.

You see, already in 1990, Nicolas Gisin and Joe Polchinski showed that nonlinear modifications to quantum mechanics allow superluminal signalling, and even worse, cause the statistical interpretation of the quantum mechanical density matrix to break down. In the comments I explain the proof, because it’s so beautiful and elegant.

Now, the AI does at some point acknowledge the prior work of Gisin and Polchinski, but doesn’t explain what its new formalism and criteria adds. In fact, the criteria that the LLM came up with is not a criteria for determining whether non-linear modifications of quantum mechanics violate relativistic covariance (the subject of the paper). Instead, it presents a criteria for whether non-local modifications to the Hamiltonain violate relativistic covariance, which of course they do. The LLM’s criteria fails to catch local, non-linear modifications to the Hamiltonian, but does catch non-local and non-linear modifications, but only because they are also non-local. Simple put, the criteria the LLM comes up with has nothing to do with non-linear modifications to quantum theory. I’ve posted some details in the comments, but it’s interesting that the LLM’s criteria looks reasonable at first glance, and only falls apart with more detailed scrutiny, which matches my experience the times I’ve tried to use them.

The LLM almost makes a contribution —namely, it considers a claimed loophole to the Gisin-Polchinski no-go theorem, due to Kaplan and Rajendran. Here, there could be some benefit to a field-theoretic analysis of the sort GPT performs, but the AI makes little attempt to seriously engage with Kaplan and Rajendran whose claimed loophole lies outside the scope of the methods it uses. The AI doesn’t end up refuting Kaplan and Rajendran but rather makes a perfunctory gesture toward one. Typical LLM vagueness.

Now, Steve has done valuable work on designing systems to determine when LLM output can be trusted—he knows better than most that these systems make errors. And I have no reason to believe the paper is *wrong*. It’s just... unnecessary. The AI suggested using the Tomonaga-Schwinger formalism to determine when non-linearities violate relativity, and this produces technically correct mathematics, but the criteria it comes up with has nothing to do with the actual problem it claims to solve.

Nor did it have the scientific taste to say: “Actually, this research direction was answered decades ago. What specifically are you hoping to learn that we don’t already know?”

This is what I mean by science slop: work that looks plausibly correct and technically competent but isn’t, and doesn’t advance our understanding. It has the *form* of scholarship without the *substance*. The formalism looks correct, the references are in order, and it will sit in the literature forever, making it marginally harder to find the papers that actually matter.

You might think: no problem, we can use AI to sift through the slop. And indeed LLMs are great at finding prior work, especially if you don’t know what key words to search for. But it currently lacks insight into whether any particular paper is correct or interesting. When I fed Steve’s GPT’s paper to Gemini 3.0, it told me what a breakthrough it was—until I pointed out the prior work, at which point it immediately changed its tune. GPT-5.1, to its credit, actually flagged that the paper made no new contribution. When I asked whether it was written by an LLM, it said it obviously was, and helpfully provided a list of telltale excerpts proving its case.

It neglected to mention that this was stated explicitly in the acknowledgements. And none of the LLM noticed that the criteria that the paper used, did not address the theories under discussion.

The problem is that sorting through slop is difficult. Here’s an example you can try at home. A paper by Aziz and Howl was recently published in *Nature*—yes, that *Nature*—claiming that classical gravity can produce entanglement. If you feed it to an LLM, it will likely tell you how impressive and groundbreaking the paper is. If you tell the LLM there are at least two significant mistakes in it, it doesn’t find them (at least last time I checked). But if you then feed in our critique it will suddenly agree that the paper is fatally flawed. The AI has pretty bad independent judgement.

This is the sycophancy problem at scale. Users can be fooled, Peer reviewers are using AI and can be fooled, and AI makes it easier to produce impressive-looking work that sounds plausible and interesting but isn’t. The slop pipeline is becoming fully automated.

Of course, human researchers also produce slop—only a small percentage of papers are interesting. And yes, we should definitely have higher standards for publication, and write fewer papers. But the difference between human slop and AI slop, is that human slop helps sustain a community of researchers. And we need this community of researchers, to train students, serve as institutional memory, and act as a distributed network of knowledge and collective taste. It is this wider community of researchers which sustain the great scientists. While the slop that AIs produce have no benefit as far as I can see.

The rate of progress is astounding. About a year ago, AI couldn’t count how many R’s in strawberry, and now it’s contributing incorrect ideas to published physics papers. It is actually incredibly exciting, to see the pace of development. But for now the uptick in the volume of papers is noticeable, and getting louder, and we’re going to be wading through a lot of slop in the near term. Papers that pass peer review because they look technically correct. Results that look impressive because the formalism is sophisticated. The signal-to-noise ratio in science is going to get a lot worse before it gets better.

The history of the internet is worth remembering : we were promised wisdom and universal access to knowledge, and we got some of that, but we also got conspiracy theories and misinformation at unprecedented scale.

AI will surely do exactly this to science. It will accelerate the best researchers but also amplify the worst tendencies. It will generate insight and bullshit in roughly equal measure.

Welcome to the era of science slop!

Discussion about this post

Ready for more?