A New Breakthrough in AI Solving Mathematical Conjectures: The Peking University Team's Exploration

15 min read Original article ↗

This article from Peking University's Beijing International Center for Mathematical Research (BICMR) reports what may be the most significant milestone yet in AI-assisted mathematical research: a fully automated AI framework that solved an open problem in commutative algebra and verified the proof in approximately 19,000 lines of Lean 4 code. This is an end-to-end pipeline where AI agents autonomously discovered and formally verified a solution to a previously unsolved research problem.

The team is led by Bin Dong (董彬), a Boya Distinguished Professor at BICMR who also directs the Center for Theory of AI at Peking University's Institute for Artificial Intelligence and serves as Deputy Director of the Center for Machine Learning Research. Dong's background bridges applied and computational mathematics with machine learning, positioning him at the exact intersection this project demands. His conviction, articulated as early as 2023, was that serious mathematical AI must combine natural language reasoning with formal verification, a vision that has since been vindicated by the team's architecture. The four senior faculty members bring complementary strengths: Ruochuan Liu (刘若川), an academician of the Chinese Academy of Sciences and Dean of Peking University's School of Mathematical Sciences, specializes in p-adic number theory and arithmetic geometry; Liang Xiao (肖梁), a professor at BICMR since 2019 who earned his PhD from MIT under Kiran Kedlaya and holds a gold medal with a perfect score from the 42nd International Mathematical Olympiad, provided the algebraic expertise that guided problem selection and proof strategy; and Zaiwen Wen (文再文), a Boya Distinguished Professor with a PhD in operations research from Columbia University (2009), contributes optimization and algorithmic foundations.

The institutional context matters. BICMR, founded in 2005 and directed since its inception by Gang Tian (田刚), the renowned geometer and co-proposer of the Yau-Tian-Donaldson conjecture, sits on the north shore of Peking University's Weiming Lake in the traditional Jingchunyuan gardens. It is a government-funded research center explicitly tasked with pushing frontier mathematics and cultivating world-class talent. The AI4Math team's "spontaneously organized" formation, described in the article as researchers naturally converging around a shared conviction rather than being assembled for a specific grant reflects the intellectual freedom and autonomy of BICMR.

The team built two collaborating AI agents: Rethlas, which handles natural-language mathematical reasoning (literature search, proof construction, trial-and-error refinement), and Archon, which translates natural-language proofs into formally verified Lean 4 code. Underpinning both is a dual search engine infrastructure: LeanSearch, which enables semantic search over Mathlib's hundreds of thousands of formalized theorems using natural language queries (now widely adopted by the Lean community, with 8,000+ daily API calls), and Matlas, which covers tens of millions of mathematical statements in natural language.

The specific result, a disproof of the Anderson Conjecture from Dan D. Anderson's 2014 collection Open Problems in Commutative Ring Theory, illustrates a striking capability. Rethlas used Matlas to locate a technical result by Jensen (2006) about completions of integral domains, a paper from an entirely different subfield that turned out to provide exactly the tool needed to construct a counterexample. This kind of cross-domain retrieval, which in traditional mathematical practice would require deep conversations between specialists in different areas, is precisely where AI search at scale can outperform human memory and disciplinary boundaries. Archon then converted the proof into roughly 19,000 lines of Lean 4 code, making two autonomous decisions along the way: rejecting its initial formalization plan after detecting a hidden logical gap, and finding an alternative route when a required mathematical concept was absent from Mathlib.

Some Remarks:

  • Lean 4: A high-performance functional programming language that doubles as an interactive theorem prover for writing verified code and mathematical proofs.

  • Mathlib: A massive, community-driven library of formalized mathematics that provides the definitions and theorems needed to do advanced research in Lean 4.

Source: Beijing International Center for Mathematical Research (北京国际数学研究中心BICMR) | Date: Not specified in source

Author Background: This article was published by the official WeChat account of the Beijing International Center for Mathematical Research (BICMR) at Peking University, reporting on the work of the PKU AI4Math team led by Professor Bin Dong (董彬).

  1. Peking University's AI4Math team has built an automated AI framework that solved the Anderson Conjecture, an open problem in commutative algebra, and formally verified the proof in approximately 19,000 lines of Lean 4 code.

  2. The team, formed in 2023, combines expertise across algebra, number theory, optimization, and machine learning, deliberately targeting open research problems rather than competition mathematics.

  3. A dual search engine architecture (LeanSearch for formalized math, Matlas for natural-language math) is the system's core innovation, enabling AI agents to locate cross-domain results that human researchers would struggle to connect.

  4. Two AI agents, Rethlas and Archon, handle natural-language reasoning and formal verification respectively, with Archon making autonomous decisions to correct logical gaps and work around missing library components.

  5. The result demonstrates a viable new research paradigm for AI-assisted mathematics, though the system currently relies on foreign foundation models.

Note: Bold-italicized text indicates emphasis by the translators.

Interior courtyard of the BICMR campus

Interior courtyard of the BICMR campus at Peking University.

Recently, the AI4Math team, assembled by Professor Bin Dong's (董彬) research group at the Beijing International Center for Mathematical Research (BICMR) at Peking University in collaboration with partners including IQuest Research Institute (至知创新研究院), announced that their automated AI framework has solved an open problem in commutative algebra known as the Anderson Conjecture, and completed approximately 19,000 lines of formal verification in Lean 4. Behind this achievement lies three years of accumulated effort by a team rooted in Peking University's mathematical tradition.

In 2021, DeepMind published a striking paper in Nature titled "Advancing Mathematics by Guiding Human Intuition with AI," demonstrating how AI could assist mathematicians in discovering new mathematical patterns. The paper attracted wide attention in the mathematical community and caught the eye of many faculty members at BICMR.

In early 2022, through the introduction of Professors Ruochuan Liu and Liang Xiao, Professor Dong's group began collaborating with Xuhua He (何旭华), then a professor at the Chinese University of Hong Kong, using machine learning to analyze a class of algebraic geometry objects known as affine Deligne-Lusztig varieties. The research not only "rediscovered" known dimension formulas but also detected systematic deviations in boundary regions where current theory was relatively weak, leading to the proof of new theorems. The results were published in 2024 in the Peking Mathematical Journal. This was the team's first experience seeing AI capture mathematical patterns that humans had not yet noticed.

Paper citation image

B. Dong, X. He, P. Jin, F. Schremmer, Q. Yu, "Machine learning assisted exploration for affine Deligne-Lusztig varieties," Peking Mathematical Journal, 2024.

In early 2023, large language models advanced rapidly. Dong, already deeply engaged with machine learning, began asking a more ambitious question: could AI participate more comprehensively in mathematical research? Many colleagues considered this far-fetched, but Dong believed it was worth a full effort. His key insight was that natural language reasoning and formal verification had to be combined. The former would search through existing literature for clues and construct proof strategies, while the latter would translate those strategies into formalized code that a computer could check line by line. This vision has guided the project ever since.

Dong's ideas earned the endorsement and support of his colleagues, and the Peking University AI4Math team was formed. Professors Liu, Xiao, Zaiwen Wen (文再文), and Dong brought expertise from algebra and number theory, optimization, and machine learning and artificial intelligence respectively. Members ranged from undergraduates to senior researchers. Unlike most teams at the time that focused on mathematical competitions, this group targeted open conjectures in algebra as their primary direction from the outset. Pure mathematicians identified the essence of problems, while applied mathematicians built algorithms and systems. Peking University's long-standing strength across multiple foundational areas of mathematics provided the soil for this deep collaboration.

For AI to perform serious mathematical reasoning, it requires a complete infrastructure stack: retrieval, reasoning, data, and evaluation. The team identified retrieval as the most critical component, even though it is often underestimated. When confronted with a new problem, finding the right entry point, which theoretical tools or perspectives from which subfield might be relevant, is often the step that demands the most mathematical wisdom. Mathematical knowledge is distributed across numerous subdisciplines, and cross-domain connections are hidden rather than obvious. AI must first locate the relevant theories and literature before it can begin constructing proofs.

Formal verification relies on Lean, a proof assistant where every step of a mathematician's derivation is checked line by line by the Lean kernel. Underpinning Lean is Mathlib, a formalized mathematical library containing hundreds of thousands of theorems and definitions. The team essentially started from scratch in this area: members taught themselves Lean, then gradually organized courses and seminars in China to promote the adoption of formalized mathematics. In the process, the team confronted a core bottleneck: how can AI quickly find the specific theorem it needs among Mathlib's hundreds of thousands of entries? To address this, the team built LeanSearch, which allows semantic retrieval of relevant theorems using natural language descriptions. It is now widely used by the official Lean community, with API calls exceeding 8,000 per day.

The natural language literature side faces its own retrieval bottleneck. The team built Matlas, covering tens of millions of mathematical statements and supporting proposition-level semantic search. LeanSearch covers the formalized world; Matlas covers the natural language world. This dual-engine architecture is the framework's defining feature, distinguishing it from other AI mathematics systems.

Beyond retrieval, the team has made systematic contributions in reasoning, data, and evaluation, which are not discussed in detail here due to length constraints.

System architecture overview

System architecture overview.

On top of this infrastructure, the team built two collaborating AI agents, both constructed on top of existing foundation models.

Rethlas handles natural language reasoning: searching the literature, constructing proof plans, and iteratively refining through trial and error. It simulates how a mathematician works: proposing conjectures, attempting constructions, backtracking when encountering contradictions, and gradually converging on a viable proof path through multiple rounds of experimentation.

Archon handles formal verification: translating natural language proofs into Lean 4 code and filling in all details. Archon was previously open-sourced and had its formalization capabilities validated on research-grade benchmarks.

The team built a comprehensive set of capabilities for both agents, tailored to the demands of mathematical research: reasoning strategies and proof planning methods distilled from the research practices of working mathematicians; the ability to precisely retrieve from hundreds of thousands of formalized theorems and tens of millions of natural language statements; and mechanisms for autonomously finding alternative paths when infrastructure is missing. Behind these capabilities lies the Peking University team's deep understanding, built over years of sustained reflection, of what mathematical research actually requires. This understanding is not something a general-purpose large model can replace.

Rethlas-Archon collaboration workflow.

Professor Liang Xiao, who has long been deeply engaged in algebra, played a critical role throughout this process. He not only helped the team systematically screen for worthwhile open problems in algebra, but more importantly, infused the agents' construction with a mathematician's modes of thinking, taste, and aesthetic judgment: how to identify the core difficulty of a problem, how to choose a proof strategy, when to persist and when to change course. These hard-to-articulate research intuitions were gradually distilled, under his guidance, into executable reasoning strategies for the agents. The Anderson Conjecture was one of the targets that Xiao selected for the team.

This particular open problem, cataloged in Open Problems in Commutative Ring Theory, was proposed by University of Iowa mathematician D.D. Anderson in 2014. As Xiao explained, the problem concerns a property of "quasi-complete local rings," a class of rings designed to characterize the infinitesimal structure and deformations of geometric objects locally (for example, near a specific point). It is a foundational open question in the branch of commutative ring theory. Specifically, the question asks whether a Noetherian local ring satisfying a certain "weak" approximation property automatically satisfies a stronger version of the same property. This type of question recurs throughout mathematics: the gap between weak and strong conditions often conceals unexpected constructions. The conjecture remained unsolved since it was proposed, because the technical tools needed to construct a counterexample were scattered across different subfields, making it difficult for any single researcher to master them all.

Rethlas used Matlas to search through tens of millions of mathematical statements and located a technical result by Jensen (2006) about completions of integral domains, from a seemingly unrelated direction. Jensen's work had nothing to do with the Anderson problem, but Rethlas recognized that it provided exactly the tool needed to construct a counterexample, and delivered a negative answer: the weak condition does not imply the strong condition. This kind of precise cross-domain retrieval is precisely what ordinarily requires deep exchanges between specialists in different fields to achieve.

Archon then translated the proof into approximately 19,000 lines of Lean 4 code. Formalization is not translation: every "obviously" in a paper must be rigorously expanded in Lean 4. Archon made two autonomous decisions. First, it discovered a hidden logical flaw in the initial plan, rejected the original approach, and redesigned the overall technical route for the formalized proof. Second, it found that a required mathematical concept had not yet been included in Mathlib, and autonomously identified an equivalent alternative route to circumvent the obstacle. The final code covered key results from six external papers. Compared to an experienced Lean expert, Archon completed formalization work of equivalent scope with at least a 10x efficiency improvement.

The breakthrough is encouraging. As Ruochuan Liu observed: "This has shown us a possible form of deep integration between mathematics and AI. What mathematicians need to pay attention to is not just whether a specific problem has been solved, but whether this new research paradigm can sustain output and cultivate a new generation of young researchers who both understand mathematics and can harness AI tools. Dong's team's exploration over the past few years, from building tools to solving problems, has forged a persuasive path. We hope the AI4Math direction can attract more outstanding young people. Peking University's mathematics community has the confidence and the conditions to provide them with fertile ground for growth."

It is worth noting that the organizational model behind this work is itself unusual. The PKU AI4Math team was not assembled around a specific project, but rather grew from the natural convergence of people who shared a common conviction about the direction. This "spontaneously organized" research model remains rare in China, and its formation would not have been possible without strong support from the university and its departments.

A "spontaneously organized" team model carries a distinctive idealistic spirit, but it also faces real pressures and challenges. Externally, AI technology evolves by the day, and competition in both academia and industry is fierce and full of uncertainty. When to hold firm and when to adjust course, each strategic judgment affects the entire team's direction, with pressure and anxiety as constant companions. Internally, team members come from different branches of mathematics and different areas of computer science, with varying academic backgrounds, personal aspirations, and value systems. Aligning everyone toward a shared purpose requires continuous communication and encouragement, as well as persuading young researchers to step outside the comfort of existing evaluation frameworks and pursue work with genuine long-term value.

Academician Gang Tian (田刚) noted: "The PKU AI4Math team, composed of Dong and professors Liu, Xiao, and others, has used an AI framework to solve an open problem in commutative algebra and completed large-scale formal verification. This is the first such achievement in China and carries significant demonstration value. This result demonstrates that Peking University's mathematics community has a vibrant academic ecosystem, and that PKU's young mathematicians possess tremendous innovative vitality. I hope to give young scholars more platforms, encouraging and supporting them to innovate boldly, to explore courageously, to further advance the deep integration of AI and mathematics, to achieve more internationally influential breakthroughs, and to play a key role in major scientific and technological challenges that the nation urgently needs to address."

In frontier areas like AI4Math, close collaboration with social organizations, including enterprises, is one of the keys to keeping the team at the technological frontier. The team specifically acknowledged the close cooperation with IQuest Research Institute (至知创新研究院) on computing power, resources, and technology, as well as the long-term support for basic research from the New Cornerstone Science Foundation (新基石科学基金会).

Looking forward, the intersection of AI and mathematics holds promise on at least three levels: building and leveraging AI tools to tackle increasingly difficult mathematical problems, continuously raising the ceiling of research; studying AI itself, where mathematics offers a unique perspective for understanding and improving AI, with the potential to drive more economical, safer, and more interpretable AI systems; and applying mathematical thinking to harness AI for problems in science and engineering that are equally profound. The possibilities are only beginning to open up.

Currently, the foundation models underlying both agents are primarily based on foreign large language models. The team looks forward to deeper collaboration with leading domestic foundation model teams in the future, jointly advancing this direction.

Detailed information, paper links, and open-source code for all work discussed in this article can be found on Professor Dong's personal homepage (faculty.bicmr.pku.edu.cn/~dongbin/) and the project blog (frenzymath.com). Key links for the Anderson Conjecture solution and open-source tools are below:

Discussion about this post

Ready for more?