- Article
- Published:
- Rishi Mehta1,
- Laurent Sartran ORCID: orcid.org/0000-0002-5538-83271,
- Miklós Z. Horváth ORCID: orcid.org/0000-0001-6928-74231,
- Goran Žužić1,
- Eric Wieser ORCID: orcid.org/0000-0003-0412-49781,
- Aja Huang1,
- Julian Schrittwieser1,
- Yannick Schroecker1,
- Hussain Masoom1,
- Ottavia Bertolli ORCID: orcid.org/0000-0001-8578-32161,
- Tom Zahavy1,
- Amol Mandhane1,
- Jessica Yung1,
- Iuliya Beloshapka1,
- Borja Ibarz1,
- Vivek Veeriah1,
- Lei Yu1,
- Oliver Nash ORCID: orcid.org/0000-0001-7208-63071,
- Paul Lezeau ORCID: orcid.org/0009-0006-6142-99531,
- Salvatore Mercuri ORCID: orcid.org/0000-0002-8997-86321,
- Calle Sönne ORCID: orcid.org/0009-0008-1871-426X1,
- Bhavik Mehta1,
- Alex Davies ORCID: orcid.org/0000-0003-4917-52341,
- Daniel Zheng1,
- Fabian Pedregosa1,
- Yin Li1,
- Ingrid von Glehn1,
- Mark Rowland1,
- Samuel Albanie1,
- Ameya Velingker1,
- Simon Schmitt1,
- Edward Lockhart1,
- Edward Hughes ORCID: orcid.org/0000-0002-2434-23341,
- Henryk Michalewski1,
- Nicolas Sonnerat1,
- Demis Hassabis1,
- Pushmeet Kohli ORCID: orcid.org/0000-0002-7466-79971 &
- …
- David Silver ORCID: orcid.org/0000-0002-5197-28921
Nature (2025)Cite this article
-
81k Accesses
-
2 Citations
-
334 Altmetric
We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.
Subjects
Abstract
A long-standing goal of artificial intelligence is to build systems capable of complex reasoning in vast domains, a task epitomized by mathematics with its boundless concepts and demand for rigorous proof. Recent AI systems, often reliant on human data, typically lack the formal verification necessary to guarantee correctness. By contrast, formal languages such as Lean1 offer an interactive environment that grounds reasoning, and reinforcement learning (RL) provides a mechanism for learning in such environments. We present AlphaProof, an AlphaZero-inspired2 agent that learns to find formal proofs through RL by training on millions of auto-formalized problems. For the most difficult problems, it uses Test-Time RL, a method of generating and learning from millions of related problem variants at inference time to enable deep, problem-specific adaptation. AlphaProof substantially improves state-of-the-art results on historical mathematics competition problems. At the 2024 IMO competition, our AI system, with AlphaProof as its core reasoning engine, solved three out of the five non-geometry problems, including the competition’s most difficult problem. Combined with AlphaGeometry 23, this performance, achieved with multi-day computation, resulted in reaching a score equivalent to that of a silver medallist, marking the first time an AI system achieved any medal-level performance. Our work demonstrates that learning at scale from grounded experience produces agents with complex mathematical reasoning strategies, paving the way for a reliable AI tool in complex mathematical problem-solving.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Supplementary information
Supplementary Information
This PDF contains numerical solve rates supplementing Extended Figure 4, Hyper-parameter values, details on problems used to benchmark autoformalization, and sample proofs from the AlphaProof agent. These proofs are primarily from the PutnamBench benchmark. The file includes Supplementary Tables 1-7 and Supplementary Figures 1-6.
Supplementary Data
Pseudocode written in Python elaborating the high level structure of AlphaProof: RL, autoformalization, and variant generation.
Rights and permissions
About this article
Cite this article
Hubert, T., Mehta, R., Sartran, L. et al. Olympiad-level formal mathematical reasoning with reinforcement learning. Nature (2025). https://doi.org/10.1038/s41586-025-09833-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41586-025-09833-y