Olympiad-level formal mathematical reasoning with reinforcement learning

4 min read Original article ↗
  • Article
  • Published:

Nature (2025)Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

A long-standing goal of artificial intelligence is to build systems capable of complex reasoning in vast domains, a task epitomized by mathematics with its boundless concepts and demand for rigorous proof. Recent AI systems, often reliant on human data, typically lack the formal verification necessary to guarantee correctness. By contrast, formal languages such as Lean1 offer an interactive environment that grounds reasoning, and reinforcement learning (RL) provides a mechanism for learning in such environments. We present AlphaProof, an AlphaZero-inspired2 agent that learns to find formal proofs through RL by training on millions of auto-formalized problems. For the most difficult problems, it uses Test-Time RL, a method of generating and learning from millions of related problem variants at inference time to enable deep, problem-specific adaptation. AlphaProof substantially improves state-of-the-art results on historical mathematics competition problems. At the 2024 IMO competition, our AI system, with AlphaProof as its core reasoning engine, solved three out of the five non-geometry problems, including the competition’s most difficult problem. Combined with AlphaGeometry 23, this performance, achieved with multi-day computation, resulted in reaching a score equivalent to that of a silver medallist, marking the first time an AI system achieved any medal-level performance. Our work demonstrates that learning at scale from grounded experience produces agents with complex mathematical reasoning strategies, paving the way for a reliable AI tool in complex mathematical problem-solving.

This is a preview of subscription content, access via your institution

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

$32.99 / 30 days

cancel any time

Subscribe to this journal

Receive 51 print issues and online access

$199.00 per year

only $3.90 per issue

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

Author information

Authors and Affiliations

  1. Google DeepMind, London, UK

    Thomas Hubert, Rishi Mehta, Laurent Sartran, Miklós Z. Horváth, Goran Žužić, Eric Wieser, Aja Huang, Julian Schrittwieser, Yannick Schroecker, Hussain Masoom, Ottavia Bertolli, Tom Zahavy, Amol Mandhane, Jessica Yung, Iuliya Beloshapka, Borja Ibarz, Vivek Veeriah, Lei Yu, Oliver Nash, Paul Lezeau, Salvatore Mercuri, Calle Sönne, Bhavik Mehta, Alex Davies, Daniel Zheng, Fabian Pedregosa, Yin Li, Ingrid von Glehn, Mark Rowland, Samuel Albanie, Ameya Velingker, Simon Schmitt, Edward Lockhart, Edward Hughes, Henryk Michalewski, Nicolas Sonnerat, Demis Hassabis, Pushmeet Kohli & David Silver

Authors

  1. Thomas Hubert
  2. Rishi Mehta
  3. Laurent Sartran
  4. Miklós Z. Horváth
  5. Goran Žužić
  6. Eric Wieser
  7. Aja Huang
  8. Julian Schrittwieser
  9. Yannick Schroecker
  10. Hussain Masoom
  11. Ottavia Bertolli
  12. Tom Zahavy
  13. Amol Mandhane
  14. Jessica Yung
  15. Iuliya Beloshapka
  16. Borja Ibarz
  17. Vivek Veeriah
  18. Lei Yu
  19. Oliver Nash
  20. Paul Lezeau
  21. Salvatore Mercuri
  22. Calle Sönne
  23. Bhavik Mehta
  24. Alex Davies
  25. Daniel Zheng
  26. Fabian Pedregosa
  27. Yin Li
  28. Ingrid von Glehn
  29. Mark Rowland
  30. Samuel Albanie
  31. Ameya Velingker
  32. Simon Schmitt
  33. Edward Lockhart
  34. Edward Hughes
  35. Henryk Michalewski
  36. Nicolas Sonnerat
  37. Demis Hassabis
  38. Pushmeet Kohli
  39. David Silver

Corresponding authors

Correspondence to Thomas Hubert or Eric Wieser.

Supplementary information

Supplementary Information

This PDF contains numerical solve rates supplementing Extended Figure 4, Hyper-parameter values, details on problems used to benchmark autoformalization, and sample proofs from the AlphaProof agent. These proofs are primarily from the PutnamBench benchmark. The file includes Supplementary Tables 1-7 and Supplementary Figures 1-6.

Supplementary Data

Pseudocode written in Python elaborating the high level structure of AlphaProof: RL, autoformalization, and variant generation.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hubert, T., Mehta, R., Sartran, L. et al. Olympiad-level formal mathematical reasoning with reinforcement learning. Nature (2025). https://doi.org/10.1038/s41586-025-09833-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41586-025-09833-y