“The product of mathematics is clarity and understanding. Not theorems, by themselves.”
—Bill Thurston
My best theorem is one I never wrote down.
It crystallized one bright morning in Lausanne, Switzerland, as I was preparing for my last invited conference talk. The proof felt so obvious—and the result so compelling—that I made the reckless move of editing my slides at the last minute. Time was running out and I could only include the announcement as an informal remark at the bottom of the last slide, instead of stating it as a proper theorem.1
I had already quit academia and founded a machine-learning startup. I knew I would be too busy to write a clean proof and publish it. That was my excuse for being sloppy. I just wrote the remark and abandoned the slide deck as a message in a bottle.
My hope was that some bright young mathematician would pick it up someday and formalize the result as part of a broader theory. If I lucked out with the intrinsic randomness of attribution, I thought, it might even be remembered as the Bessis cellular decomposition theorem.
But that was stupid. By claiming the result, I had killed the incentive for anyone to write it up.
If I had to pick my second best result, it would be Theorem 0.5 in my old preprint on Garside categories. I had high ambitions for this paper, yet I ended up not submitting it anywhere. The creative process had drained me, and I left active research before regaining the courage to clean up the preliminary sections.
For a second best, this theorem is shockingly easy to prove. Once you get the preliminaries right, it only takes a few pages of pretty terrestrial group theory.
As for the preliminaries, they are even easier. All you have to do is plagiarize a dozen or so classical papers in an arcane subfield called Garside theory, replacing the original axiom set with a slightly more general one. If you understand what you’re supposed to do, it is almost impossible to run into serious difficulties—it’s just a giant conceptual find-and-replace bulk edit. But you have to take my word for it, because I balked at producing the hundreds of pages of necessary details.
If you think that the hard part of a mathematician’s job is to prove theorems, let this serve as a counterexample—from the moment I conceived of Theorem 0.5, I knew it was true and that proving it would be straightforward.
What was the hard part, then?
Conjecturing the exact statement and writing it down?
Not even. In this example, this part was equally straightforward.
The hard part was to intuit that there should be “something like Theorem 0.5”, and to come up with a conceptual framework where it became easy to express. Once I got the definitions right, the rest followed more or less organically.
Research mathematics isn’t always like that, but there are miracle days where you just put your skis on and the next thing you know is that you’re accelerating downhill.
Jean-Pierre Serre famously said that writing his revolutionary paper on coherent sheaves didn’t require any thinking. Everything fell so naturally into place that his typewriter generated the 100 pages entirely by itself, as if the article had pre-existed.
But I wasn’t Jean-Pierre Serre and he wouldn’t lend me his typewriter. This is why my brightest mathematical idea never made it to publication.
Do I feel sad about it? Not really. My preprint remains freely available on the arXiv and has already been cited dozens of times, including by some fancy papers. The real innovation wasn’t Theorem 0.5, but the language that made it possible, especially Definitions 2.4 and 9.3—and this language found its way to a 700-page book on Garside theory that filled out much of the missing preliminaries.
To be honest, I also had a selfish reason for sacrificing my most innovative preprint. It enabled me to focus on the more tedious preprint where I used Definition 9.3 as a magic ingredient in the resolution of a classical problem in my domain, the 𝐾(𝜋,1) conjecture for finite complex reflection groups, which permanently elevated my symbolic status as a mathematician.
But, in truth, the David who solved the 𝐾(𝜋,1) conjecture is a social parasite of the much better mathematician, the David who crafted Definitions 2.4 and 9.3.
In the past few months, as I was grappling with the rapidly changing situation around AI and mathematics, I found myself more troubled than I ever expected to be.
In theory, I should feel vindicated and happy. In practice, I am also puzzled, worried, and sad.
The happy part of me sees a genuine revolution and gets excited. The vindicated part has legitimate claims to have prepared for it scientifically and epistemologically. The puzzled part is stunned by the timeline and accompanying frenzy. The sad part feels nostalgic for a lifestyle and value system that it engaged with and walked away from, and which might soon disappear.
The worried part holds the synthesis. I always knew that the general public had a flawed perception of mathematics, but never expected this to become an existential threat for the discipline itself.
In my book Mathematica: a Secret World of Intuition and Curiosity, I framed the misunderstanding as the tension between two versions of mathematics, official math and secret math.
Official math manifests itself as a formal deduction system where you start from axioms and mechanically derive theorems. This is a nerd’s paradise, a world where truth takes binary values, reasoning is either valid or invalid, and there is technically no room for bullshit.
Secret math is the human part of the story—why official math was invented, how we can successfully interact with it, its effects on our brains, and the bizarre mental techniques through which mathematicians continuously expand its territory.
Secret math never made it to the curriculum, because it lacks the defining qualities of official math, and also because it feels peripheral. Official math is cold, hard, logical, objective, and it is rumored to be the language of the universe. Secret math is soft, fuzzy, subjective and, by contrast, it looks like cheap pedagogical backstory.
No wonder professional mathematicians have such a dissociative view of their job.
The first rule of the Intuition Club is: you don’t talk about the Intuition Club. The second rule is, if you really want to talk about intuition, make it sound casual and accessory, because we ain’t the psychology department. The third rule is definitions are worth zero points, expository work counts negative, and the best jobs should always go to the people who proved the hardest theorems.
If you think I’m exaggerating, here is what G. H. Hardy wrote in his celebrated (yet insufferable) mathematical autobiography:
There is no scorn more profound, or on the whole more justifiable, than that of the men who make for the men who explain. Exposition, criticism, appreciation, is work for second-rate minds.
This is peak dissociation. Behind closed doors, mathematicians are quick to complain about Hardy’s curse. They insist on the importance of teaching, even for their own comprehension of the subject matter. They lament the system’s pathological obsession with theorem-proving priority, while everyone knows the hard work often takes place outside of that loop, when trying to make sense of existing results. Yet, in public, they are bound by the honor code of mathematicians. Prove theorems and shut up!
There is one exception, though. Once you get the Fields medal, you are free to say whatever you want.
Bill Thurston, the 1982 Fields medallist, was a spectacular dissenter. Two years before his death, he took part in an extraordinary exchange on MathOverflow, in response to this question posted by an insecure undergrad:
What can one (such as myself) contribute to mathematics?
I find that mathematics is made by people like Gauss and Euler—while it may be possible to learn their work and understand it, nothing new is created by doing this. One can rewrite their books in modern language and notation or guide others to learn it too but I never believed this was the significant part of a mathematician work; which would be the creation of original mathematics. It seems entirely plausible that, with all the tremendously clever people working so hard on mathematics, there is nothing left for someone such as myself… Perhaps my value would be to act more like cannon fodder? Since just sending in enough men in will surely break through some barrier.
Thurston jumped in:
It’s not mathematics that you need to contribute to. It’s deeper than that: how might you contribute to humanity, and even deeper, to the well-being of the world, by pursuing mathematics? Such a question is not possible to answer in a purely intellectual way, because the effects of our actions go far beyond our understanding. We are deeply social and deeply instinctual animals, so much that our well-being depends on many things we do that are hard to explain in an intellectual way. That is why you do well to follow your heart and your passion. Bare reason is likely to lead you astray.2 None of us are smart and wise enough to figure it out intellectually.
The product of mathematics is clarity and understanding. Not theorems, by themselves. Is there, for example any real reason that even such famous results as Fermat’s Last Theorem, or the Poincaré conjecture, really matter? Their real importance is not in their specific statements, but their role in challenging our understanding, presenting challenges that led to mathematical developments that increased our understanding…
Mathematics only exists in a living community of mathematicians that spreads understanding and breathes life into ideas both old and new. The real satisfaction from mathematics is in learning from others and sharing with others. All of us have clear understanding of a few things and murky concepts of many more. There is no way to run out of ideas in need of clarification…
Here we need to take a short metaphysical break, because it is all too easy to brush Thurston’s words off as “feel-good” or “woke”.
In my first Substack post, I (half-jokingly) declared that we had been wrong about mathematics for 2300 years, stuck in a false dilemma between formalism (“mathematics is a meaningless game of formal symbols”) and Platonism (“mathematics captures properties of actual entities living in the perfect world of ideas”).
My proposed conceptualist resolution is a rephrasing of Thurston’s view: mathematics does rely on a meaningless game of formal symbols, but we only play this game because we project meaning onto it.
Meaning is a cognitive phenomenon—a product of our neural architecture—and not a direct access to transcendence.
When we “do math”, we manipulate formal expressions and gradually develop an intuitive feel for what they represent, as if they were pointers to objects that “existed” in a Platonic sense. Platonists take this neuroplastic side-effect at face value. Formalists view it as accessory. Conceptualists like me recognize mathematics as a critical cognitive infrastructure of the human species.
A natural question is why the conceptualist resolution took so long to emerge. One reason is that it goes against the prevailing spiritualist worldview, which refuses physicalist interpretations of mathematics.
It also goes against the honor code of mathematicians. Hardy’s curse is so powerful that even Thurston found it hard to overcome. When multiple MathOverflow users thanked him for his take, he noted in reply:
Thanks for the comments. I try to write what seems real. By now, I have no cause to fear how I will be judged, which makes it much easier for me. It’s gratifying when my reality means something to others.
But then, how could such a toxic honor code survive for so long?
The answer is simple. The honor code was useful to mathematics as an academic discipline. It helped it stay exceptionally healthy and meritocratic, as noted in the epilogue of my book:
This system has its merits. It reduces arbitrariness and helps mathematicians guard against complacency and nepotism. When a discipline deals with eternal truths, it offers a neat way to evaluate careers.
The honor code also served as a guide to researchers themselves, when evaluating new ideas and new directions of research. Concept-building and problem-solving, the two facets of mathematics, are in a symbiotic relationship, as remarked by 2018 Fields medallist Peter Scholze:
What I care most about are definitions. For one thing, humans describe mathematics through language, and, as always, we need sharp words in order to articulate our ideas clearly… Unfortunately, it is impossible to find the right definitions by pure thought; one needs to detect the correct problems where progress will require the isolation of a new key concept.3
This is how the system worked for millennia. Mathematicians created value by introducing new concepts, but the rule was that only theorems could put bread on the table. The deal was fine because the two aspects almost always walked hand in hand. David, the social parasite who claimed credit for the 𝐾(𝜋,1) conjecture, was the same person as the David who crafted Definitions 2.4 and 9.3.
Solving a big conjecture was a cryptographic proof that you had come up with a genuine conceptual innovation.
I am using the past tense because this is no longer the case. There is a structural vulnerability in the honor code of mathematicians and AI has started exploiting it in a systematic manner.
The trigger for this post was a speech by Geoff Hinton, which caught me off guard:
I agree with Demis Hassabis, the leader of DeepMind, who for many years has said AI is going to be very important for making scientific progress…
There’s one area in which that’s particularly easy, which is mathematics, because mathematics is a closed system…
I think AI will get much better at mathematics than people, maybe in the next 10 years or so. And within mathematics, it’s much like things like Go or Chess that are closed systems with rules...
I was used to the general public being profoundly wrong about the nature of mathematics. But I wasn’t prepared for a Turing awardee and Nobel prize winner comparing it with Go and Chess.
I wrote a short response on X and tried to move on, but it kept troubling me.
Then it all clicked into place. About a year ago, I had been approached by a young mathematician friend who had done his PhD in my domain. He was thinking about launching an “AI for pure math” business and I mentored him for a while.
Like him—and like Hinton and Hassabis—I was fully convinced that AI was about to transform mathematics and science in general. But I was unsure about the business model and minimum viable product.
Mathematicians may look like Luddites, but they rarely are. They love pen and paper, blackboard and chalk, but they jumped on Donald Knuth’s typesetting revolution. A century ago, they chose to rebuild their entire knowledge stack on a new operating system, set theory, that promised massive gains in reliability and scalability. A few decades later, they recognized that there was no real difference between a mathematical proof and a calculation, and set out to build the first computers. Deep learning, with its heavy use of linear algebra and stochastic gradient descent, is a brainchild of mathematics.
In the 1970s, when Kenneth Appel and Wolfgang Haken built a computer-assisted proof of the four-color theorem, this opened an intense debate on the epistemic nature of such proofs and their admissibility in peer-reviewed journals. Although, to be honest, there never was much suspense—the barbarians won, because there were barbarians on both sides.
Computers had always been part of my mathematical life4 and the promise of AI and autoformalization had long felt irresistible to me. This is what got me excited when my friend reached out and asked for my advice.
I started looking at the “AI for math” space and couldn’t understand what was going on. Why were these startups raising so much money? Pure mathematics is such a tiny market. The investments felt disproportionate.
My preferred strategy, the one I would have pursued, was to create the Wolfram Research of the AI age. Mathematics-enabled science and technology is a much larger market than pure mathematics and, as Wolfram demonstrated, there is room for simplifying and productizing the experience of interacting with mathematical objects. The users love the product and it is sticky.
But my friend insisted he wanted to do something specifically about pure math.
I didn’t know what to say, because I was stuck. The only useful products I could think of were literature spotters and interactive proof assistants—hard to package, hard to price, and even harder to sell. I could see a long term business strategy, but it was one I wouldn’t touch with a ten-foot pole—becoming the Elsevier of the AI age, the most hated brand in science, an arm-twisting extractive monopoly that repackages the mathematical commons into a mandatory experience.
There was a third strategy, though. But it was risky and, like the previous one, it did require a certain degree of cynicism. I’d call that plan the luxury acquihire: 1/ build a useless product that is striking enough, 2/ give the impression that you have solved a major scientific problem, 3/ pray for a quick M&A by a tech giant or a major AI lab.
Still, the numbers didn’t add up. The “AI for math” startups were rumored to be raising hundreds of millions. There must have been a smarter investment thesis, which I was failing to comprehend.5
Then I heard that Google was leading a massive effort to solve the existence and smoothness of Navier–Stokes equations. I thought OK, I get it, that’s a Millennium Prize problem. But, wait, that still doesn’t make sense—the payout is one million dollars, peanuts. As one great mathematician remarked to me, Google likely mobilized more brainpower on this single effort than the entire community ever did.
It only started to make sense after I heard Hinton’s speech.
If mathematics really was a closed system—or if this is what all the stakeholders around the table are willing to believe—then the investment pitch becomes trivial: “DeepMind solved Go and Chess, we’re going to solve mathematics!”
At a time when the leading AI labs are betting trillions that humans are soon to become obsolete, the promise of “solving mathematics”, the crown jewel, the pride of the human race, is simply irresistible.
On February 5th, a team of eleven high-profile mathematicians (including Martin Hairer, the 2014 Fields medallist) announced the First Proof project and released a first batch of ten “research-level math questions”:
This manuscript represents our preliminary efforts to come up with an objective and realistic methodology for assessing the capabilities of AI systems to autonomously solve research-level math questions. After letting these ideas ferment in the community, we hope to be able to produce a more structured benchmark in a few months.
One of our primary goals is to develop a sophisticated understanding of the role that AI tools could play in the workflow of professional mathematicians. While commercial AI systems are undoubtedly already at a level where they are useful tools for mathematicians, it is not yet clear where AI systems stand at solving research level math questions on their own, without an expert in the loop.
From a purely scientific perspective, there is nothing to complain about. These are incredibly smart people, engaging a real-world controversy with an open-minded attitude and a creative approach.
The First Proof team was doing everything right—and this is what terrified me.
But before I explain, I must reiterate that I have a very high opinion of this project. The team represents the mathematical community at its best, people driven by curiosity and integrity, willing to experiment outside of their comfort zone, and they did come up with genuinely good ideas.
Daniel Litt wrote an excellent essay, Mathematics in the library of Babel, on the First Proof project and his own first-hand assessment of the AI-for-math situation. His perspective is that of a radical non-Luddite, a pure mathematician who has engaged with LLMs for years and is genuinely impressed by the recent progress.
He was surprised by how many of the First Proof open problems ended up being solved by the teams at Google, OpenAI, and others. By his own count:
It seems likely that somewhere between 6 and 8 [of the 10] problems were solved correctly if one combines all attempts.
There are serious caveats, though:
The models (and the humans supervising them) generated an enormous amount of garbage, including some incorrect solutions claiming to be formalized in Lean. Even the best models/scaffolds seem not to be able to reliably detect when they are producing nonsense.
Even when AI-for-math systems operate autonomously, which few in the current crop actually do, they still require humans to intervene upstream and downstream, if only to assess the results and filter out the junk. This isn’t anecdotal. The AI labs are investing so much human intelligence into these projects, much more than any real-life mathematician can ever mobilize, that the delineation is never entirely clear. The most damning illustration is this:
It was not clear to OpenAI which of their solutions to First Proof are correct.
In other words: without the pro bono effort of the good old academic community, they might have never known. (Litt was himself a contributor.)
In truth, this also applies to human-generated proofs. But there is a fundamental nuance. In the human way of doing math, theorem-proving and concept-building walk hand in hand, which forces proofs to be intelligible (if only to their authors).
This is where the metaphysics comes back to bite. If mathematics was just a formal game of meaningless symbols, intelligibility would be a vacuous notion. The reality of mathematical practice forcefully points in the opposite direction. As it happens, published research is full of bugs, but these bugs tend to be contained and fixable, precisely because human-generated proofs are meaningful and (almost always) directionally correct—two notions that are impossible to reconcile with the formalist worldview.
This isn’t a cosmetic nuance. This is a mandatory condition for mathematics to exist as a sustainable endeavor.
As noted by Litt, the essential characteristic of being “truth-seeking” was missing from the AI solutions to the First Proof questions:
Many correct solutions are very poorly written, and their correctness is exceedingly difficult to check because of this... In [human solutions] the main ideas, obstructions to be overcome, etc., are usually identifiable; in [AI solutions] they are often completely unclear. And in the course of writing their solutions, the human authors often develop useful new objects, terminology, etc. to capture what they’re doing, while the models usually just plow ahead.
This is where the second leg of the math-for-AI revolution comes into play: autoformalization, the capability to transform the mildly informal style of human-produced proofs—and the outputs of LLMs trained on them—into bulletproof, machine-verifiable logical derivations, expressed in specialized languages such as Lean.
On paper, this can address the issue of correctness and remove the need for human validation. But, as noted by Litt, “there was only one credible formalized solution to any problem, impressively and manually orchestrated by Tom de Groot.”
This is of course an evolving situation. The stakes are high and the investment massive. LLMs, scaffolding, and autoformalization, are all making steady progress.
What would happen if, a year from now, the First Proof team released another set of 10 problems of equivalent difficulty? Litt doesn’t answer this specific question, but he expects AI to autonomously produce results “at a level comparable to that of the best few papers” within the next few years.
I share his sentiment. In my view, the likely outcome would be that the leading AI labs would score a perfect 10, coming up with fully automated & fully correct solutions to all problems.
Does that mean that AI will have “solved mathematics” by early 2027?
Of course not. There are three additional caveats that I am sure all insiders will have already spotted, but non-specialist readers may have missed.
The first one is straightforward—the First Proof “research-level questions” were neither profound nor difficult. They were closer to technical lemmas, well-calibrated intermediate subproblems that occur in the course of proving a theorem and are typically handled in a few paragraphs or pages. There is an ocean between a technical lemma and a serious paper, another ocean between a serious paper and a breakthrough, another ocean between a breakthrough and a Fields-medal-level contribution, and yet again several oceans above that.
First Proof is now working on a second batch, which is likely to include harder problems.
From a technical perspective, they did pick the right level for their first batch. This only appeared in hindsight: the final score was above 0/10 and below 10/10, in the sweet spot for their stated goal of finding an “objective and realistic methodology for assessing the capabilities of [current] AI systems”.
Yet I think that releasing a benchmark that didn’t include serious problems was extremely dangerous. The general public doesn’t read the fine print, and may not even know what a lemma is. If Google or OpenAI had scored a perfect 10, the headline would have read: “GAME OVER: The world’s top mathematicians challenged an AI with 10 research-level problems; the AI nuked each and every one of them.”
Similar distortions are being made every single day on social media, feeding the very confusion that First Proof was trying to dispel.
The second caveat is much more profound. It is also extremely subtle and hard to communicate to the general public, and this combination is creating a tricky situation for the mathematical community.
The problem with unintelligible proofs goes way beyond correctness, and cannot be resolved by autoformalization alone: even if correct, unintelligible proofs aren’t accretive to the mathematical corpus.
I know this sounds barbaric. Let me explain with an example. A few weeks ago, Math Inc, one of the best-funded AI-for-math startups, produced a Lean formalization of Maryna Viazovska’s spectacular work on the sphere packing problem in dimensions 8 and 24, results which earned her Fields medal in 2022. That was impressive in its own right: never before had theorems of this caliber been autoformalized. Yet this clear success was met by a massive pushback from the “formal mathematics” community, the very people who lead the effort to port “human mathematics” into machine-verified code.
Luddites?
Well, it’s more complicated, as it emerges from their actual conversation. The thing is that autoformalization isn’t a full solution to the problem of formalizing mathematics, just like Tesla’s Full Self-Driving isn’t a full solution to the problem of driving cars.
Yes, I know, that sounds counterintuitive. This is why outsiders are likely to miss the nuance and label the pushback as Luddism.
A clear explanation can be found in Alex Kontorovich’s account of his own learning curve with formalized mathematics. In a nutshell: Mathlib, the dominant Lean library, is a human-curated formalization of an ever-growing fraction of existing human mathematics. It exposes clean APIs and abstractions, without which no autoformalization could take place. By contrast, Math Inc’s autoformalized proof of Viazovska’s results exposes no intelligible interface. Who in their right mind would merge a 200,000-line unaudited vibe-coded blob into the master branch of global human science?
Kontorovich has a great expression for what is missing—canonization:
By canonization, I mean the process of taking a local, one-off formalization and turning it into library mathematics: general, reusable, coherent, efficient, and compatible with the rest… Canonization often changes the picture itself: the definitions, the abstractions, the API, and sometimes even the statement… This is extremely difficult and time-consuming.
But this, again, might be mistaken for Luddism. “General”, “reusable”, “coherent”, “efficient”… Aren’t these reactionary arguments made up by dumb humans with narrow context-windows? Why would an AGI care about that? Isn’t autoformalization, by design, the definitive solution of all problems with vibe-coding?
Doesn’t proven code meet the highest conceivable standards for software? Why are you constantly shifting the goalposts?
The software analogy helps understand the core issue. For normal software, quality is first and foremost a pragmatic notion—good code should compile, run smoothly in production and satisfy user needs. But Lean code never runs anywhere. It is just a library that sits in a repository and might be imported in the future, who knows when, in the process of formalizing another theorem.
You cannot “ship and let the users do the QA for you” because your only user is the future of mathematics. This is why canonization, from Euclid to Zermelo-Fraenkel to Bourbaki, has always been a core concern of the formalist school.6
We’re back to the metaphysics. The problem with unintelligible mathematics isn’t that it might be false. It is that it is literally meaningless, in the sense that it doesn’t compile on the only hardware that is currently able to make sense of it and appreciate its value—the human brain.
This, of course, could change in the not-so-distant future. There is nothing magical about the human brain. Artificial sense-making and truth-seeking architectures will certainly emerge, and at some point they will probably surpass humans on all aspects of mathematical creativity, including the canonization of formal proofs (which, from an algorithmic complexity perspective, is a mandatory requirement for all forms of mathematics, whether human or superhuman).
Yet this is absolutely not our present—nor a straightforward continuation of current trends—and I don’t think the conditions are met for a fruitful debate. In any case, if humans are still around, they will still feel the need to understand the world, and this will continue to drive them to engage with mathematics.
No one knows the timeline to AGI. Meanwhile, the autocanonization capabilities of frontier models barely exceed zero and Math Inc’s autoformalized proof of Viazovska’s theorems sits as an orphaned 200,000-line blob.
This is what made the Mathlib community so angry. They had been working on a multiyear project to formalize Viazovska’s work. Math Inc jumped in on this collective effort, leveraged prior insights, then abruptly went silent, until they made their spectacular announcement.
Is this necessarily bad news? Now that the brute-force autoformalization is done, why can’t the Mathlib community refocus on the value-creating canonization?
Because of Hardy’s curse and the honor code of mathematicians. Math Inc captured the prize (“first formalization of a Fields-medal-level theorem”) and there is no social reward left for cleaning up after them. Hence this comment by Patrick Massot, a non-Luddite expert in formalized mathematics:
I think the situation is pretty clear: AI companies, and especially Math Inc, will indeed thoroughly bomb this area to turn it into a giant radioactive wasteland that will never be able to sustain life again, so we will never get the benefits expected from formalization (improved understanding and accessibility). I strongly advise young people to contribute to less shiny projects that are less likely to be destroyed.
What makes the situation really tricky is that unintelligible formal proofs may hold significant residual value, even if they aren’t accretive to the canonized corpus.
And, to be honest, the issue existed well before AI, with the four-color theorem, with the classification of finite simple groups, or with Tom Hales’s monumental work on the Kepler conjecture (which led him to seek a formalized proof).
The likely outcome is that formalized mathematics will now develop in two separate layers, an intelligible layer embodied by Mathlib, and an unintelligible layer we might call Mathslop, a library of results that are known to be correct via proofs that no human has ever understood.
When the First Proof problems were released, my greatest fear was that the big AI labs were going to one-shot every single one of them, for the wrong reasons—because the answers might have been available somewhere in one shape or another.
Litt notes that 2 of the 10 problems had solutions7 in the existing literature (which, no surprise, most LLMs were able to leverage), while a “sketch of proof” was available for a third problem (which, nevertheless, no LLM was able to solve).
But what about the remaining 7 problems? Were they really open? In what sense?
My view is that it is impossible to say for sure, due to a structural feature of the mathematical corpus which, no doubt, is going to play a central role in the AI-for-math debate. In fact, I suspect that our legacy notions of “creativity” and “innovation” are ill-founded, and that AI is about to teach us brutal lessons about them.
Most mathematicians are intuitively aware of this structural feature, although its exact shape and size are impossible to chart. I haven’t seen any serious attempt to theorize it and the feature has no agreed-upon name—let me call it the Overhang.
I expect the Overhang to be absolutely gigantic.
I met it on several occasions in my career. Most encounters were fairly casual, except for one mystifying experience that took place as I was doing my best work, right after I came up with the core idea in my Definition 9.3. That was an entirely new concept, built on a stack of new ideas that I had just come up with, unlocking the solution to a classical conjecture—subjectively, that felt like a stroke of genius.
As I was “canonizing” my most creative idea ever, I made a shocking discovery. My proud invention, divided Garside categories, was essentially equivalent to a seemingly esoteric construction in an entirely different branch of mathematics, namely the Bökstedt-Hsiang-Madsen subdivision of Connes’s cyclic category in algebraic K-theory.
In other words, my one genius idea was déjà vu. Or was it? Two decades later, I continue to view this moment as the high point of my career. I don’t really care if my idea was or wasn’t “original”, as I doubt this is the most salient notion. What struck me was the sudden flash of meaning, which was otherworldly. (A clear cognitive sign of long-term value.)
In any case, a good LLM might have spotted the syntactic analogy in the preliminary phases of my work, when I still had no clear idea where it was leading. Then it could have scooped me by “front running.”
The existence of profound correspondences between seemingly unrelated topics is one of the most celebrated aspects of mathematics, a source of joy and marvel. It might be the purest expression of mathematical beauty. Descartes changed the world by noticing that algebra and geometry were essentially the same, and by building a bridge between them. Once you “invent” cartesian coordinates, many legacy problems become trivial—and interesting new problems instantly emerge.
But modern mathematics is so fabulously complex that most correspondences go unnoticed.
Sometimes connecting the dots leads to initial mystery rather than discovery. A striking example is John McKay’s famous remark that 196 883 + 1 = 196 884. The number 196 883 appeared in the study of the Monster, while the number 196 884 appeared in the study of modular forms, two areas of mathematics that were mutually alien. The conjectural vision that they should be related seemed so ludicrous that it was labelled the Moonshine theory, or the Monstrous Moonshine. Richard Borcherds received the 1998 Fields medal for proving it correct.
But sometimes connecting the dots is enough to solve a major problem. The last mile of proving a conjecture is often about realizing that the missing bit was already present somewhere in the literature.
The Overhang consists of the unrealized capital gains of past mathematical creativity, the latent value from connecting the dots in the existing corpus. It is a dividend of canonization. Mathematician X states problem A, mathematician Y crafts concept B, then mathematician Z notices that B trivially solves A and “captures” the social reward. But in the process of capturing the reward, Z usually introduces new concepts and new open problems, reinjecting latent value into the Overhang.
LLMs can be trained on the entirety of the mathematical corpus. Thanks to their phenomenal memorization and pattern-matching abilities (without always being able to map out their associative logic and attribute due credits), they are in a unique position to harvest the Overhang. By contrast, professional mathematicians have typically read a few hundred articles in their career, out of millions of existing references, less than 0.1% of the total.
This will lead to great discoveries, which is unambiguously exciting. But it could also lead to a sad new deal, where human slaves painfully curate the Overhang while AIs systematically beat them at the finish line.
We are very far from it, though, which in and of itself is disorienting. Litt adds this sharp remark:
The mystery is this: a human with these capabilities would, almost certainly, be proving amazing theorems constantly. Why haven’t we seen this from the models yet? What are they missing?
The answer seems quite obvious—current AI systems and humans process mathematics in entirely different ways. The best models are insanely stronger on certain aspects, which necessarily implies that humans are still insanely stronger on others.
AI is at the same time superhuman and subhuman, depending on how you look at it.
This is a compliment to both AIs and humans. This plurality of aspects makes the whole “benchmarking” enterprise extremely fragile, if not ludicrous. There is a real possibility that AI will achieve problem-solving supremacy long before it achieves concept-building adequacy. What are we going to call that? “Superintelligence”? Seriously?
This is my core issue with the First Proof approach. It is a meaningful benchmark for working mathematicians wondering what AI can do for them in the present, from within the constraints of an obsolete honor code, at a moment when they urgently need to break away from it.
Why don’t we construct benchmarks that are fairer to humans? Because we can’t. Because the true value of mathematics, the collective and individual elevation of our worldviews, is ill-defined and intangible. This intangibility was the raison d’être of the honor code.
For millennia, we had agreed to only benchmark human intelligence on its problem-solving facet, as we had found that it was the best objective proxy. Theorem proving is so inordinately difficult for our cognition that the only progress path was through patient concept building and neuroplastic internalization of these new concepts.
Yet this was only ever a proxy. The thing we really care about is different in kind. Industrial robots are far stronger than humans, yet we still go to the gym. Blenders have been outchewing us for over a century, yet we still don’t eat exclusively through straws.
I know the optics are horrible. Declaring that math is first and foremost about comprehension, an unbenchmarkable aspect, sounds like an all-too-convenient excuse. Yet this isn’t something I am pulling out of my hat at the last minute. Here is what Thurston wrote in 2011:
Mathematics is commonly thought to be the pursuit of universal truths, of patterns that are not anchored to any single fixed context. But on a deeper level the goal of mathematics is to develop enhanced ways for humans to see and think about the world. Mathematics is a transforming journey, and progress in it can be better measured by changes in how we think than by the external truths we discover.
I cited this passage in my book, written five years ago, before the AI storm started. I also cited a 1628 rant by Descartes against the Ancient Greek mathematicians who, “with a kind of pernicious cunning”, had “suppressed” the “true mathematics”(which he identified with the inner cognitive methodology) and only published “childish and pointless” stuff, “the fruits of their method, some barren truths proved by clever arguments, instead of teaching us the method itself.”8
Terry Tao made a similar comment in his recent interview with Dwarkesh Patel:
In math, the process is often more important than the problem itself. The problem is kind of a proxy for measuring progress.
He also made this remarkable prediction:
I think within a decade, a lot of things that mathematicians currently do—what we spend the bulk of our time doing and a lot of stuff we put in our papers today—can be done by AI. But we will find that that actually wasn’t the most important part of what we do.
The feedback on social media was overwhelmingly positive, as is often the case with Tao’s public interventions. Yet I did notice some unusual dissent, specifically on the notion that problem solving in itself isn’t all that valuable.
To the AGI prophets, this passed as a refusal to see the writing on the wall. There were some actual sneers.
On March 3rd, Jeremy Avigad, the first director of the newly-created Institute for Computer-Aided Reasoning in Mathematics, posted a five-page essay on Mathematicians in the Age of AI.
It contains a detailed account of the Math Inc vs Mathlib incident, of which Avigad was a prime witness. He too is a radical non-Luddite, ideally positioned to observe the rapidly changing situation:
We have to face up to the fact that AI will soon be able to prove theorems better than we do…
We need to remember our strengths: mathematicians are problem solvers and theory builders extraordinaire. Rather than fight the use of AI in mathematics, we should own it. It is not enough to keep up with current events and design benchmarks for AI researchers; we need to play an active role in deploying the technology and molding it to our purposes.
Regarding the First Proof initiative, Avigad admits having been “surprised” that Google could solve “six out of the ten problems.”
As written above, I would have put it differently—I felt more “relieved” than “surprised.” But I was “surprised” too, in the sense that I had no intuition of how difficult the problems really were and no clear anticipation of how many would be solved.
What I did find disconcerting was Avigad’s congratulatory comment on the interest sparked by First Proof:
A few AI developers took the bait
I am not sure if this is irony or naivety. To people familiar with the tech ecosystem, it was pretty clear that the First Proof problem set would trigger a five-alarm scramble at all the leading AI labs. Who could miss out on an opportunity like that?
Here is how Greg Brockman, the president of OpenAI, commented on it:
“First Proof” (http://firstproof.org) is a very cool concept and approach. congrats and thank you to the team who put it together!
The timeline is remarkable. Brockman’s comment came on February 14th, as OpenAI was on a roadshow to secure the largest private technology fundraise in history. Even more remarkable, Brockman wasn’t the only member of OpenAI’s leadership team to get “distracted” by geeky side-projects in the midst of these incredibly tense weeks. A few days before, Sam Altman had found the time to do a bit of coding on his own and express his sadness at realizing that Codex had rendered him “useless.”
But was this ever a distraction? You could see this as a massive fish taking the bait, or a fox thanking the hens for handing the keys to the henhouse.
To be clear, I don’t think OpenAI—or any of the other AI labs—was doing anything evil. The First Proof team hadn’t left them any choice. Once the problems had been released, they had to compete, and they had to go all in, because they couldn’t afford to be outclassed.
In fact, all this is normal. The AI labs have genuinely exciting tech, and theorem-proving is a genuinely exciting use case.
The only problem is what we are all tempted to read into it.
This is what made Hinton’s speech so troubling. He presented mathematics as a low-hanging fruit, a “particularly easy” terrain for AI, because it is “a closed system with rules,” “like Go and Chess.” Yet Go and Chess aren’t ordinary closed systems. They are actual games with winners and losers. Even more so, they are finite two-person games with perfect information and, by a 1913 theorem of Zermelo, they admit optimal strategies.9
But if AI ever “wins” at mathematics, then someone else will have to be the “loser,” and the only other player in the room is human thought.
Who cares if mathematics isn’t an adversarial game—and we are still figuring out the rules. When the news is this sensational, it becomes fit to print.
Here is what the nuclear scenario looks like. A year from now, an AI lab assembles a secret A-team, throws a few billion dollars of compute at the Riemann hypothesis, and comes up with a 2,000,000-line Lean proof. What are mathematicians going to say?
That the proof is unintelligible? Right, but who cares? Isn’t the whole of math already unintelligible to most people?
That the theorem was still proven by the humans in the loop? Yes and no—and in that specific case I would probably vote no.
That a few billion dollars represents decades of the entire mathematical sciences budget at the National Science Foundation? Correct, so what? And what if the conjecture happens to be much easier, accessible with only nominal resources?
That the conjecture itself was still produced by a human brain, along with the entire mathematical stack underneath, starting from the very language in which the conjecture was written, the concepts of infinite sums, complex numbers, meromorphic functions and analytic continuation, the interpretation of the conjecture as a profound statement about the distribution of prime numbers, and the prophecy that it should be true? And that, at this point, humans are the only beings who care about it?
This objection is profound and valid. But when you look like a sore loser, nobody cares what you say.
I don’t expect this exact scenario to unfold. But I do expect to see, in the near future, one or more claims that AI has “solved math”, or “achieved mathematical supremacy.” And, given the current state of the conversation, I expect the actual evidence supporting these claims to be framed and/or understood as a defeat for human intelligence.
If Geoff Hinton doesn’t get it, then 99.9999% of people won’t get it either.
Whatever they will feel the urge to say when the disaster happens, mathematicians should start saying now. It is probably too late to entirely dodge the bullet, but at least they have a little time to anticipate and coordinate.
The clearest sign of urgency is the growing defiance of younger mathematicians, many of whom are joining the AI labs. Full disclosure: I understand their choice, and would likely do the same if I were in their shoes.
Meanwhile, non-tenured academics are showing signs of despair. Ashvin Swaminathan, a postdoctoral fellow at Harvard, commented that Litt’s essay, which he agrees with, left him “with a lot of questions about the longevity of research mathematics as a profession outside of the most elite institutions.”
Swaminathan’s thread is a great read that touches on an essential aspect of mathematics—teaching:
One might ask whether research mathematicians can support their work through teaching. But that ignores the dramatic impact that AI is having on teaching. A university degree may very soon be worthless, given how many students use AI to do their work for them. There’s a reason why the number of tutoring requests I’ve received, and the number of students who attend office hours, has decreased dramatically. There’s a reason why students do incredibly well on homeworks and far worse on exams. Many kids are graduating with no real skills.
Let me emphasize that this is the same exact issue as in research. Problem solving is just a means to an end. The core payout of pure mathematics is the neuroplastic elevation of our worldview, and we can’t get there by copy-pasting the result.
The false notion that mathematics is only about proving stuff (or finding the right results) is toxic at all levels, starting in primary school.
In particular, the frequently heard claim that “AI can already perform PhD-level research”—if and when it will be correct—isn’t as compelling as it may seem. PhD students are students. Reasonable advisors direct them to problems with accessible strategies. In fact, most published research sits comfortably within the Overhang, where it serves a dual purpose, part research in the true sense, and part cognitive maintenance of the research community. That is not a dig at my former colleagues, but a testament to the genuine difficulty of frontier mathematics, which only exists as the spearpoint of a collective endeavor focused on intelligibility.
If theorem-proving remains their only official currency, all research mathematicians run the risk of becoming demonetized in the course of the next few years.
Is there anything the mathematical community can do?
I don’t see any miracle solution, but there are obvious approaches to limiting the damage. The first one is to finally come clean about the nature of mathematics.
All the active mathematicians I have cited, from Tao to Avigad, from Litt to Kontorovich, are careful to note that there is something in mathematics that goes beyond theorem proving. Yet this is usually framed as a passing remark, not a major shift in the narrative.
Well, this is a major shift in the narrative, and the public will never take it seriously if it isn’t properly explained. Right now, it looks like an excuse.
You can’t reposition a plurimillennial brand without being explicit about what you are doing, why you are doing it, and which additional value it will bring to everyone. It is necessarily a big deal, necessarily difficult, and the time-window to get it right is extremely narrow.
This might be the largest product recall in memory—we sold a bogus idea of math to billions of people, and the bill is coming due.
The only way is to lead with the core message and hammer it over and over again. Let me do my share by citing another Thurston quote (also picked by Avigad as the epigraph to his recent essay on Mathematical Understanding):
We are not trying to meet some abstract production quota of definitions, theorems and proofs. The measure of our success is whether what we do enables people to understand and think more clearly and effectively about mathematics.
Long term, that could force a complete rethinking of the ethos and evaluation criteria of the profession. Mathematicians will have to revoke the honor code and the second law of the Intuition Club—“if you really want to talk about intuition, make it sound casual and accessory, because we ain’t the psychology department.” If Thurston is correct, then we are the psychology department.
All that is easier said than done. The mathematical community isn’t a structured organization that can convene an emergency offsite and put together a crisis response.
It doesn’t even share a unified view of mathematics. Some genuinely think the whole point is to “solve previously unsolved problems”, while others object to any serious discussion of intuition. I disagree with both of these views, but my opinion is nothing but that—an opinion.
The absence of a clear definition and mission statement keeps the community on the back foot.
Christian Szegedy, a researcher who has been a co-founder of xAI and Math Inc, made the prediction that a “super-human mathematician AI” would emerge by June 2026. This is the kind of declaration that is either empty (electromechanical calculators have been superhuman for a century) or maximalist in the worst possible manner. Not surprisingly, Szegedy also wrote that he found Terry Tao “out of touch with the reality that was coming up quickly.”
The problem isn’t with the prediction itself (Szegedy has real technical legitimacy and the right to his opinions, however grandiose) but with the general climate that makes it very difficult for a response to be heard.
One simple idea I keep returning to is that we need something like a Mathematical Intelligence Scale, or Mathematical Automation Scale, like there is for self-driving cars.
That would address the pressing danger of objective benchmarks that, because of their very nature, fail to account for intelligibility and concept-building. When AI crushes the next iteration of First Proof, the press release should read: “AI solves First Proof and achieves level 3 (out of 10) on the Mathematical Intelligence Scale”.
Mathematicians shouldn’t be afraid of subjectivity. It is not about cheating. It is about being fair and not committing collective seppuku. In fact, any act of problem-setting is already subjective—there is nothing objective about the First Proof problems, Erdős problems, or Millennium Prize problems, apart from the fact that (in theory) we can objectively assess whether any of them has been solved.
This quality of being objectively verifiable is missing from the more profound aspects of mathematical creativity, such as problem-setting itself, canonization and concept-building, reframing, simplifying, conjecturing, up to major theory-building and the creation of entirely new branches of mathematics—yet that doesn’t mean these activities do not exist and aren’t central to mathematics.
There is even an objective dimension to mathematical subjectivity. When Descartes connected algebra to geometry, when Newton and Leibniz invented calculus, it was pretty clear to their contemporaries that something major had just taken place. In a sense, our entire technostructure, including AI startups, lives on the dividends of those inventions.
An ideal scale would be open-ended—or have seemingly inaccessible upper rungs—since math isn’t something to be “solved.”
Mathematics is known to be inexhaustible and infinitely hard. This implies that there is room for infinite progress. The one urgency is to make it clear to everyone that it isn’t a trophy that is up for grabs.
Sadly, a lot of the damage is already baked in. Even with perfect execution, it is impossible to conduct an epistemic revolution under the pressure of an external timeline. This one will have to take place in a broader context that is extremely unfavorable to core science, where every nuance is instantly dismissed as gatekeeping.
These will be difficult years for mathematicians. Students will desert. Money will dry up. Postdocs will ask pressing questions and senior faculty will have to come up with answers. At parties, people will start giving them weird looks, as if they had become living fossils.
Yet, at some point, the pressure will alleviate.
The industry will have to normalize and the AI-for-math startups will have to find their business models. Which, in all likelihood, won’t be about solving Erdős problems.
To me, this might be the most striking aspect of the current moment. For all the flex and fury, the AI labs are barely engaging with mathematics itself, its core problems and significance. Which is entirely normal, since this is none of their business.
When the dust settles, if it ever does, we will find out that human mathematics has survived, transformed.
I won’t make any prediction on the timeline, because I have no idea. I don’t know which capabilities will be achieved when, which breakthroughs they will enable, and which limitations will appear in hindsight.
What I can say is that the change is already taking place. This really is a scientific revolution and the research community has started actively engaging with it. As I was writing this article, I saw remarkable developments every single day in my timeline. My initial plan was to cite the most striking ones, but there are just too many and I am overwhelmed.
A pessimist would argue that this is a fear-induced response and a symptom of the disruption. We are seeing the same pattern across all industries. When capital gets reallocated so fast, on a scale so massive, there may be no rational way to prepare for the future.
It is not clear to me how I would approach this if I hadn’t quit active research. The low-hanging fruit would be to vibe-code a formalization of my yet-unwritten cellular decomposition theorem. Then I could test a few wild ideas and conjectures. But all this is old-world thinking, and I suspect that even the collaboration modes and deliverables of mathematical research will have to change.
My view, which is hardly original, is that it will become unthinkable to do math without AI assistance, just like it has become unthinkable to do math without set theory and LaTeX.
But this says nothing about the second-order effects. I tried to imagine them as a forecasting exercise, and came up with a dozen or so ideas, not all profound and not all credible. Here are three that I find interesting:
1. A clearer cut between pure and applied mathematics. While both deserve to be treated as first-rate citizens, I was never happy with the traditional way of drawing the line. Algebra is presumed to be pure because it is abstract, yet it can be immensely practical. On the other hand, “applied” domains tend to have their fair share of theoretical offshoots.
My take is that pure and applied should be regarded as different sweet spots in the intelligibility-vs-applicability trade-off. It is entirely rational to invest 10 billion dollars of compute in the equations of plasma containment, if it fast-tracks the design of grid-scale fusion reactors, even if intelligibility has to wait for another century or two. I wouldn’t say the same about the Riemann hypothesis.
2. The rise of intuition-maxxers. To me, that would be the clearest sign of a new Golden Age. We are not seeing them right now because the few people with the rare combination of math and agentic skills are poached by the AI labs. But this is a market anomaly that is likely to correct in the near future (the combination is becoming less rare, and the AI labs can only absorb so many of them).
I suspect AI will enable the same transformation that samplers and sequencers enabled in music. The technical barrier to grasp frontier concepts will drop, and brave young mathematicians will deploy unorthodox tactics to survey entire new continents at a pace that will mystify their elders.
They will see further than anyone before, standing on the shoulders of giant machines.
3. A renewed interest in the philosophy and neuroscience of mathematics. In his 1979 essay, Reuben Hersh noted that Platonism and formalism were both untenable, and added this comment:
The problems of truth and meaning are not technical issues in some recondite branch of logic or set theory. They confront anyone who uses or teaches mathematics. If we wish, we can ignore them... It would be surprising if this had no practical consequences.
AI is confronting us with a practical consequence that no one10 saw coming, and that is already impacting the cognitive well-being of the general population. If we fail to articulate the neuroplastic value of intelligibility, there will be no justification whatsoever for teaching mathematics, given that the solutions to all the mathematical problems encountered by 99.99999% of the population over the course of their lifetime will be available at their fingertips.
People are vaguely aware that this would result in idiocracy, but they can’t really explain why, and we have yet to find teaching and evaluation modalities that are adapted to the new reality.
What concerns me is the false belief that there is a “normal” level of mathematical competency, as if certain concepts were any more “real” or “concrete” than the others, and advanced mathematics was gratuitously abstract. This creates the illusion that the current (and wildly unsatisfactory) abilities of the general public somehow reflect a “natural” state—when they are in fact the result of massive (yet frustratingly inefficient) pedagogical investment—and represent some solid ground. The truth is that there really is no bottom, and nothing prevents humans from functioning in a culture where there is no vocabulary for numbers above 5 (as it is the case for some hunter-gatherer tribes).
Some prophesy that mathematics will eventually resemble Chess, a sport that a few eccentrics practice with passion and the general public can safely ignore.
This is a shortsighted view that fails to account for the meaning-making and truth-seeking dimensions of mathematics, as well as the massive cognitive benefits of mathematical training. The world makes a lot more sense once you have an intuitive understanding of numbers, sets, statistics, linear algebra, calculus, differential geometry, machine learning, combinatorics, dynamical systems, model theory, group theory, singularity theory, homotopy theory, lambda calculus, complexity, category theory,… (there is no reason to stop at any given place, even though it is probably unwise to order the full menu)
The situation isn’t unique to mathematics. AI adoption creates the same exact problems in many other domains. But mathematics has a unique status and a unique aura, owing to its ancient companionship with the very notion of intelligence.
This is why the “AI for math” situation is so significant, and why its proper resolution matters so much for the future of our species.
Many people hate math, and they tend to put AI in the same bag. They view anything with mathematics or computers in it as a threat to their humanity and subjectivity, their raw experience of being alive, immersed in this world, trying to make intuitive sense of it.
They might be surprised—and thankful—to find out that mathematicians are actually on their side, fighting the good fight. As Carl Jacobi once put it, “the object of mathematics is the honor of the human spirit.”


