Early experiments in accelerating science with GPT-5

9 min read Original article ↗

Science shapes everything from human health to energy production, from national security to our understanding of the universe. If AI can accelerate science—shortening the time it takes to generate new ideas, or to move from an idea to a tested result—the benefits compound across society.

But the pace of innovation remains a constraint. Even when the right idea exists, turning it into a product or treatment can take years. In a recent survey(opens in a new window), 60 percent of people in the U.S. said scientific and medical breakthroughs reach them too slowly; 73 percent said we need better ways to accelerate discovery; and 69 percent identified scientific leadership as a top national priority.

Today, we’re releasing “Early science acceleration experiments with GPT‑5(opens in a new window),” a paper co-authored with collaborators at universities and national laboratories including Vanderbilt, UC Berkeley, Columbia, Oxford, Cambridge, Lawrence Livermore National Laboratory, and The Jackson Laboratory. It compiles early case studies across math, physics, biology, computer science, astronomy, and materials science in which GPT‑5 helped researchers synthesize known results in a novel way, conduct powerful literature review, accelerate tough computations, and even generate novel proofs of unsolved propositions. The paper also documents limitations. Our goal is to give the community a clear view of what these systems can and cannot do today in research settings.

These case studies show how, in the hands of experts, GPT‑5 is accelerating scientific discovery, and why that acceleration matters:

  • Biology: In a study led by Derya Unutmaz, M.D., scientists spent months trying to explain a puzzling change in human immune cells. GPT‑5 identified the likely mechanism within minutes from an unpublished chart and suggested an experiment that proved it. This kind of speed could help researchers understand diseases faster and develop better treatments.
  • Mathematics: In another case, researchers Mehtaab Sawhney and Mark Sellke were tackling a decades-old open problem originally proposed by Paul Erdős. They were stuck on the final step, and GPT‑5 contributed a new idea about how one odd number breaks the pattern, which helped them complete the proof. Advances like this strengthen the mathematical foundations that many algorithms and security techniques ultimately rely on.
  • Algorithms & optimization: Researchers Sébastien Bubeck and Christian Coester were testing whether a common decision-making method used in robotics and routing was as reliable as people assumed. GPT‑5 found a new, clear example showing the method can fail and also improved a classic result in optimization, the math used to figure out the best way to solve a problem. This type of advance helps engineers better understand the decision-making systems used in robotics, routing, and other real-world applications.

What is OpenAI for Science? 

The mission of OpenAI for Science is to accelerate scientific discovery: to help researchers explore more ideas, test hypotheses faster, and uncover insights that would otherwise take significant time. We do this by pairing frontier models with the right tools, workflows, and collaborations.

We work closely with researchers across academia, industry, and national labs. These collaborations help us understand where the models are useful, where they fail, and how to integrate them into the scientific process—from literature review and proof generation to modeling, simulation, and experimental design.

Our approach combines two complementary beliefs. Specialized scientific tools, such as simulation engines, protein databases, and computer algebra systems, are essential for efficiency and precision. At the same time, scaling foundation models continues to unlock new reasoning abilities: connecting ideas across fields, sketching proofs, proposing mechanisms, and navigating large literatures conceptually rather than by keyword. Where specialized tools exist, we want to use them; where general reasoning is required, we build models designed to handle it. Both paths reinforce each other.

How scientists are working with GPT‑5 today

The most meaningful progress comes from human–AI teams. Scientists set the agenda: they define questions, choose methods, critique ideas, and validate results. GPT‑5 contributes breadth, speed, and the ability to explore many directions in parallel.

Using GPT‑5 effectively is a skill. Researchers learn how to pose questions, when to push back, how to break problems into steps, and what to validate independently. Productive work often looks like dialogue—researcher and model iterating until a promising direction emerges or the idea is discarded

The current state of GPT‑5 in scientific work 

Across these early studies, GPT‑5 appears able to shorten parts of the research workflow when used by experts. It does not run projects or solve scientific problems autonomously, but it can expand the surface area of exploration and help researchers move faster toward correct results.

  • One emerging capability is conceptual literature search. GPT‑5 can often identify deeper relationships between ideas and retrieve relevant material across languages and less accessible sources. Researchers report finding references, connections, and theses they did not previously know.
  • In mathematics and theoretical computer science, where structure is explicit and feedback loops are fast, GPT‑5 is especially helpful. Mathematicians have used GPT‑5 to generate viable proof outlines in minutes, transforming work that otherwise might have taken days or weeks. In physics and computational domains, the model can propose simplifying transformations or point to analogous structures in other fields.
  • In biology and other empirical sciences, the model can propose mechanisms and design experiments to validate these hypotheses in the wet lab.

We are beyond the point where models only summarize existing knowledge. Now, early contributions from GPT‑5 can meaningfully assist researchers under expert oversight. The pace of improvement suggests the potential for deeper acceleration as capabilities and tools advance.

What this looks like in practice: a few case studies

Optimization is the math of finding the “best” option—like the lowest training loss or the shortest route in a network. Gradient descent is a basic optimization method that takes repeated small steps downhill on a function. A recent theorem(opens in a new window) by Guy Barzilai, Ohad Shamir, and Moslem Zamani asked when the sequence of values visited by gradient descent forms a convex curve over time (a curve with no dips), which makes the algorithm’s behavior easier to analyze and control. The first version of the paper showed this only for very small, conservative step sizes.

Sébastien Bubeck gave GPT‑5 the weaker version of the result and asked if the condition could be improved, and the model proposed a sharper step-size bound and a cleaner, more standard proof that he then checked carefully by hand; with more thinking time, an internal run of the model even derived the optimal bound from scratch.

GPT‑5’s contribution: GPT‑5 helped Sébastien Bubeck explore a sharper step-size condition and suggest a cleaner proof for a recent convex optimization theorem, which he verified independently.

Nikita Zhivotovskiy and his collaborators proved a new theorem in convex geometry—the study of “well-behaved” shapes where any line between two points stays inside the shape. Convex geometry underlies many models in machine learning and statistics. Once the theorem was done, the natural next question was: where else could this result be useful?

Instead of guessing search terms and scanning the literature by hand, Zhivotovskiy gave GPT‑5 the formal statement of the theorem and asked which areas it might connect to. The model pointed to work in density estimation, learning theory, and multi-objective optimization, and surfaced specific references, including several he had not seen and some in other languages.

GPT‑5’s contribution: GPT‑5 helped Nikita Zhivotovskiy identify concrete connections and references across several fields, including materials he had not encountered.

Tim Gowers, a Fields Medal–winning combinatorialist, ran a series of experiments treating GPT‑5 as a “research partner” rather than a tool for homework-style problems. He gave the model hard combinatorics questions he was actively thinking about and asked it to suggest constructions, find counterexamples, or critique partial arguments.

In multiple cases, GPT‑5 quickly spotted flaws or missing cases in candidate constructions and proposed simpler alternatives or counterexamples; in others, it stalled or failed to make progress. Gowers’ overall conclusion was that the model is already useful as a very fast, very knowledgeable critic that can stress-test ideas and save time, even though it does not yet meet his bar for full co-authorship.

GPT‑5’s contribution: GPT‑5 acted as a fast critic for Tim Gowers, spotting flaws, missing cases, and simpler alternatives during exploratory combinatorics work.

Paul Erdős posed a problem about finding the largest set of positive integers with a surprising rule: for any two numbers in the set, the product of those two numbers plus one must always be divisible by a perfect-square prime factor. Erdős guessed what the largest such set should look like, but the problem remained open for decades.

Sawhney and Sellke explored the structure of the problem and then asked GPT‑5 to help analyze how a single “out-of-place” number would affect the entire set. GPT‑5 suggested a clearer way to show that if even one number does not fit a specific pattern, it forces contradictions across almost all other numbers. That idea turned out to be the missing step. With it, the researchers completed a full proof showing that Erdős’s original guess was correct.

GPT‑5’s contribution: GPT‑5 surfaced the key insight about how one number constrains all others, enabling the authors to finish the proof of Erdős Problem 848.

Limitations

These case studies are curated illustrations of where GPT‑5 has been useful; they are not a systematic sample, and they do not capture the full range of failure modes. Expert oversight remains essential. GPT‑5 can sometimes hallucinate citations, mechanisms, or proofs that appear plausible; it can be sensitive to scaffolding and warm-up problems; it sometimes misses domain-specific subtleties; and it can follow unproductive lines of reasoning if not corrected. These are active areas of research, and we are working with collaborators to measure and mitigate these failures as we refine future systems.

What’s next

Taken together, these early studies show that GPT‑5 is beginning to help with new types of scientific work. The model is not autonomous, but in expert hands it can help prove theorems, rediscover and extend structures, surface cross-field connections, and generate mechanisms and experiments for scientists to validate.

We also see a trajectory in which these systems improve with more time and compute. If GPT‑5 can meaningfully assist with some research questions in 20 minutes, we expect deeper results when models can spend hours or days reasoning about a problem. Combined with world-class scientists, this points toward the possibility of a step-change in scientific productivity over time.