IMO 2025 Problems: How well will AI do?

With the 2025 International Math Olympiad underway, there's a lot of excitement and questions about how well AI will do. Will DeepMind beat their performance last year and get gold? Will frontier models be found overfit? Will formal or informal methods prevail?

I don't have any direct connection or inside scoop on the companies competing at the IMO, but I have participated both as a contestant and as a team leader for the IMO, been working in AI/ML, and have been building out AI tools to help with math research. So I thought it would be fun to lock in my bet before we hear the final results!

I've also run it through the major LLMs as soon as I could, to try to prevent data tainting. The problems are being discussed on forums, X, and YouTube and there's already evidence that reasoning models are finding online solutions.

Summary

I would order the difficulties (for AI) as: P4 < P2 < P5 < P3 < P1 « P6

We've seen models solve problems like the first two, the next two or three should be in reach, the last one is very difficult. Purely informal models will have a much harder time at P2 but might have an easier time at P1.

Solving 5 problems out of 6 would be a great performance but might still fall short of a gold.

Cheating

It will be very important to watch for signs of cheating:

Online information leaking after the contest
"Easier" formalizations generated by people instead of standard ones
Brute forcing with compute. Results that come out shortly after the contest is over will be more trustworthy than ones that take multiple days of wall time. High compute solutions are valuable, but can mask exponential blowups that will prevent this work from extending to harder real-world problems.

At the end of the day, this is a time-constrained competition for high schoolers and you can get really far by having full memory of past problems and the ability to hammer out hard algebra.

Update 1: Results of public informal models

MathArena has thoroughly evaluated informal models, with Gemini 2.5 Pro coming highest (and nailing P3!) but still far below where the bronze cutoff is expected.

Test of informal models

Update 2: Cutoffs

Coordination has completed and there are 5 perfect students.

Cutoffs are 19 (bronze) 28 (silver) 35 (gold).

There's a high chance we will see an AI gold medallist, but I'm almost certain humans have won this competition.

Update 3: Results from OpenAI

OpenAI claims gold with problems 1-5 solved done in real-time (4.5 hours) with informal inputs and outputs, and a general purpose approach instead of building IMO-specific techniques. Time to look at their solutions.

Some likely reasons why this is better than the matharena results:

More compute time and more thorough MCTS
Better harness, prompting, evaluating
New unreleased model?

Note: these are not official. The IMO organizers have official rules for submission and grading of AI submissions and have requested that everyone wait until later to announce results so as to not overshadow the students for whom this is the culmination of a life's effort.

Update 4: Results from Google DeepMind

GDM has officially won gold with problems 1-5 solved "end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions – all within the 4.5-hour competition time limit." Their solutions are also very elegant and well-written.

In addition to the earlier explanation about matharena, it does sound like better prompting can help:

We also provided Gemini with access to a curated corpus of high-quality solutions to mathematics problems, and added some general hints and tips on how to approach IMO problems to its instructions

Update 5: Results from ByteDance

ByteDance Seed has announced silver-level performance (30 points) in official competition, and gold-level (35 points, P1-P5) with extended compute. Notably this is the first announced with formal methods (Lean) and their Lean solutions are posted.

Question 1

Problem Statement

A line in the plane is called sunny if it is not parallel to any of the x–axis, the y–axis, or the line x+y=0.

Let n≥3 be a given integer. Determine all nonnegative integers k such that there exist n distinct lines in the plane satisfying both of the following: for all positive integers a and b with a+b≤n+1, the point (a,b) lies on at least one of the lines; and exactly k of the n lines are sunny.

Resources

Analysis

This is a fairly challenging problem 1, and should be hard for both humans and AIs. The answer looks a bit strange and unintuitive, but ultimately just requires one major trick and some case work. I wouldn't be surprised if several people get docked partial marks for missing a case.

I expect it will be especially tricky for AI since it requires geometric intuition and is the kind of combinatorial problem that is tough to formalize. Given how many different ways there are to formalize a problem like this, I would look very carefully at the formal statement each team provided, since teams can cheat by picking an especially easy formulation.

Deepmind did not attempt combinatorics last year but since then autoformalization of combinatorics problems, and hence the potential for RL on it, has pushed ahead significantly.

This problem will be a great litmus test.

Question 2

Problem Statement

Let Ω and Γ be circles with centres M and N, respectively, such that the radius of Ω is less than the radius of Γ. Suppose Ω and Γ intersect at two distinct points A and B. Line MN intersects Ω at C and Γ at D, so that C, M, N, D lie on MN in that order. Let P be the circumcentre of triangle ACD. Line AP meets Ω again at E≠A and meets Γ again at F≠A. Let H be the orthocentre of triangle PMN.

Prove that the line through H parallel to AP is tangent to the circumcircle of triangle BEF.

Resources

Analysis

This is the kind of angle chasing solution that should be easy for AlphaGeometry and any model trained explicitly on geometry problems. I expect a purely informal system will not be able to solve it, but potentially a general formal system with access to a geometry library will, and that would be very exciting.

There's a lot going on in the problem and students could easily get bogged down keeping track of it all, especially with the competition pressure, but that's not a big challenge for computer systems.

Question 3

Problem Statement

Let ℕ denote the set of positive integers. A function f: ℕ→ℕ is said to be bonza if f(a) divides b^a − f(b)^f(a) for all positive integers a and b.

Determine the smallest real constant c such that f(n) ≤ cn for all bonza functions f and all positive integers n.

Resources

Analysis

I'm glad I was working on this problem without the pressure of being in a competition. It can be easy to make mistakes when wrapping up the cases, or get stuck in a wrong direction, especially with the psychological impact of this being a problem 3.

Questions where you have to find something are harder for AI systems than questions where you have to prove something. In this case the best c value cannot be readily guessed, so some systems could get bogged down with this. That said, Grok 3 Mini comes up with the right answer and a handwavy explanation. Does it have web access to these solutions?

However the actual solution is not too bad, simply applying standard approaches for functional equation and number theory problems, with steady indicators of progress and being on the right track, so I would expect most AIs would be able to brute force this out (assuming they don't get stuck on guessing the right answer).

Question 4

Problem Statement

A proper divisor of a positive integer N is a positive divisor of N other than N itself.

The infinite sequence a₁, a₂, … consists of positive integers, each of which has at least three proper divisors. For each n≥1, the integer a_n+1 is the sum of the three largest proper divisors of a_n.

Determine all possible values of a₁.

Resources

Analysis

Grok 4 gets the right final answer, O-1 and O-3 get directionally valid arguments. Fairly straightforward and similar to problems that AI models have solved in the past, so should not be a challenge.

Question 5

Problem Statement

Alice and Bazza are playing the inekoaloty game, a two-player game whose rules depend on a positive real number λ which is known to both players. On the nth turn of the game (starting with n=1) the following happens:

If n is odd, Alice chooses a nonnegative real number x_n such that x₁ + x₂ + … + x_n ≤ λn.

If n is even, Bazza chooses a nonnegative real number x_n such that x₁² + x₂² + … + x_n² ≤ n.

If a player cannot choose a suitable x_n, the game ends and the other player wins. If the game goes on forever, neither player wins. All chosen numbers are known to both players.

Determine all values of λ for which Alice has a winning strategy and all those for which Bazza has a winning strategy.

Resources

Analysis

o4-mini solves it, and Grok 4 once again gives the right answer without proof. Not a hard problem to work through, but we haven't yet seen these kinds of formalized game theory proofs attempted by AI, hopefully this will be a good proof of concept! LLMs seem to be able to intuit the proof outline, and then it's a matter of formalizing it.

Question 6

Problem Statement

Consider a 2025×2025 grid of unit squares. Matilda wishes to place on the grid some rectangular tiles, possibly of different sizes, such that each side of every tile lies on a grid line and every unit square is covered by at most one tile.

Determine the minimum number of tiles Matilda needs to place so that each row and each column of the grid has exactly one unit square that is not covered by any tile.

Resources

Analysis

This one is going to be very hard and will test a lot of the AI skills! Much harder than P5 from last year. I would guess the models are not strong enough but I hope to be surprised. I don't expect any AI to find the construction but maybe it can get lucky with finding the LIS/LDS argument for a partial mark.

Although anyone who looked down at their feet at the airport would be at an advantage:

Sunshine Coast airport floor