Early experiments in accelerating science with GPT-5

Science shapes everything from human health to energy production, from national security to our understanding of the universe. If AI can accelerate science—shortening the time it takes to generate new ideas, or to move from an idea to a tested result—the benefits compound across society.

But the pace of innovation remains a constraint. Even when the right idea exists, turning it into a product or treatment can take years. In a recent survey⁠(opens in a new window), 60 percent of people in the U.S. said scientific and medical breakthroughs reach them too slowly; 73 percent said we need better ways to accelerate discovery; and 69 percent identified scientific leadership as a top national priority.

Today, we’re releasing “Early science acceleration experiments with GPT‑5⁠(opens in a new window),” a paper co-authored with collaborators at universities and national laboratories including Vanderbilt, UC Berkeley, Columbia, Oxford, Cambridge, Lawrence Livermore National Laboratory, and The Jackson Laboratory. It compiles early case studies across math, physics, biology, computer science, astronomy, and materials science in which GPT‑5 helped researchers synthesize known results in a novel way, conduct powerful literature review, accelerate tough computations, and even generate novel proofs of unsolved propositions. The paper also documents limitations. Our goal is to give the community a clear view of what these systems can and cannot do today in research settings.

These case studies show how, in the hands of experts, GPT‑5 is accelerating scientific discovery, and why that acceleration matters:

Biology: In a study led by Derya Unutmaz, M.D., scientists spent months trying to explain a puzzling change in human immune cells. GPT‑5 identified the likely mechanism within minutes from an unpublished chart and suggested an experiment that proved it. This kind of speed could help researchers understand diseases faster and develop better treatments.
Mathematics: In another case, researchers Mehtaab Sawhney and Mark Sellke were tackling a decades-old open problem originally proposed by Paul Erdős. They were stuck on the final step, and GPT‑5 contributed a new idea about how one odd number breaks the pattern, which helped them complete the proof. Advances like this strengthen the mathematical foundations that many algorithms and security techniques ultimately rely on.
Algorithms & optimization: Researchers Sébastien Bubeck and Christian Coester were testing whether a common decision-making method used in robotics and routing was as reliable as people assumed. GPT‑5 found a new, clear example showing the method can fail and also improved a classic result in optimization, the math used to figure out the best way to solve a problem. This type of advance helps engineers better understand the decision-making systems used in robotics, routing, and other real-world applications.

The mission of OpenAI for Science is to accelerate scientific discovery: to help researchers explore more ideas, test hypotheses faster, and uncover insights that would otherwise take significant time. We do this by pairing frontier models with the right tools, workflows, and collaborations.

We work closely with researchers across academia, industry, and national labs. These collaborations help us understand where the models are useful, where they fail, and how to integrate them into the scientific process—from literature review and proof generation to modeling, simulation, and experimental design.

Our approach combines two complementary beliefs. Specialized scientific tools, such as simulation engines, protein databases, and computer algebra systems, are essential for efficiency and precision. At the same time, scaling foundation models continues to unlock new reasoning abilities: connecting ideas across fields, sketching proofs, proposing mechanisms, and navigating large literatures conceptually rather than by keyword. Where specialized tools exist, we want to use them; where general reasoning is required, we build models designed to handle it. Both paths reinforce each other.

The most meaningful progress comes from human–AI teams. Scientists set the agenda: they define questions, choose methods, critique ideas, and validate results. GPT‑5 contributes breadth, speed, and the ability to explore many directions in parallel.

Using GPT‑5 effectively is a skill. Researchers learn how to pose questions, when to push back, how to break problems into steps, and what to validate independently. Productive work often looks like dialogue—researcher and model iterating until a promising direction emerges or the idea is discarded

Across these early studies, GPT‑5 appears able to shorten parts of the research workflow when used by experts. It does not run projects or solve scientific problems autonomously, but it can expand the surface area of exploration and help researchers move faster toward correct results.

One emerging capability is conceptual literature search. GPT‑5 can often identify deeper relationships between ideas and retrieve relevant material across languages and less accessible sources. Researchers report finding references, connections, and theses they did not previously know.
In mathematics and theoretical computer science, where structure is explicit and feedback loops are fast, GPT‑5 is especially helpful. Mathematicians have used GPT‑5 to generate viable proof outlines in minutes, transforming work that otherwise might have taken days or weeks. In physics and computational domains, the model can propose simplifying transformations or point to analogous structures in other fields.
In biology and other empirical sciences, the model can propose mechanisms and design experiments to validate these hypotheses in the wet lab.

We are beyond the point where models only summarize existing knowledge. Now, early contributions from GPT‑5 can meaningfully assist researchers under expert oversight. The pace of improvement suggests the potential for deeper acceleration as capabilities and tools advance.

These case studies are curated illustrations of where GPT‑5 has been useful; they are not a systematic sample, and they do not capture the full range of failure modes. Expert oversight remains essential. GPT‑5 can sometimes hallucinate citations, mechanisms, or proofs that appear plausible; it can be sensitive to scaffolding and warm-up problems; it sometimes misses domain-specific subtleties; and it can follow unproductive lines of reasoning if not corrected. These are active areas of research, and we are working with collaborators to measure and mitigate these failures as we refine future systems.

Taken together, these early studies show that GPT‑5 is beginning to help with new types of scientific work. The model is not autonomous, but in expert hands it can help prove theorems, rediscover and extend structures, surface cross-field connections, and generate mechanisms and experiments for scientists to validate.

We also see a trajectory in which these systems improve with more time and compute. If GPT‑5 can meaningfully assist with some research questions in 20 minutes, we expect deeper results when models can spend hours or days reasoning about a problem. Combined with world-class scientists, this points toward the possibility of a step-change in scientific productivity over time.