Accelerating life sciences research

7 min read Original article ↗

At OpenAI, we believe that AI can meaningfully accelerate life science innovation. To test this belief, we collaborated with the Applied AI team at Retro Bio(opens in a new window), a longevity biotech startup, to create and research the impact of GPT‑4b micro, a miniature version of GPT‑4o specialized for protein engineering.

In vitro, these redesigned proteins achieved greater than a 50-fold higher expression of stem cell reprogramming markers than wild-type controls. They also demonstrated enhanced DNA damage repair capabilities, indicating higher rejuvenation potential compared to baseline. This finding, made in early 2025, has now been validated by replication across multiple donors, cell types, and delivery methods, with confirmation of full pluripotency and genomic stability in derived iPSC lines. To ensure the findings are discoverable and replicable to benefit the life sciences industry, we are now sharing insights into the research and development of GPT‑4b micro.

An experimental GPT model for protein engineering

To test our belief that AI can be used to accelerate life sciences research, we designed and trained a custom model—GPT‑4b micro—to possess a broad base of knowledge and skills across biology, with a particular focus on steerability and flexibility to enable advanced use cases such as protein engineering. We initialized it from a scaled-down version of GPT‑4o to take advantage of GPT models’ existing knowledge, then further trained it on a dataset composed mostly of protein sequences, along with biological text and tokenized 3D structure data, elements most protein language models omit.

A large portion of the data was enriched to contain additional contextual information about the proteins in the form of textual descriptions, co-evolutionary homologous sequences, and groups of proteins that are known to interact. This context allows GPT‑4b micro to be prompted to generate sequences with specific desired properties and, since most of the data is structure-free, the model handles proteins with intrinsically disordered regions just as well as structured proteins. This is particularly useful for targets like the Yamanaka factors, whose activity depends on forming numerous transient interactions with a diverse array of binding partners, rather than adopting a single stable structure (Figure 2).

By training on proteins enriched with additional evolutionary and functional context, we substantially increased the effective context length of our training examples beyond that of standalone sequences. Consequently, we found that we could run prompts as large as 64,000 tokens during inference and continue to observe gains in controllability and output quality. While common in text LLMs, this context size is unprecedented in protein sequence models.

During development we observed the emergence of scaling laws similar to those seen in language models—larger models trained on larger datasets yielded predictable gains in perplexity and downstream protein benchmarks, allowing us to iterate at small scale before training the final GPT‑4b micro model. However, in silico evals for protein AI models are often of limited value, as it is unclear if such improvements translate to increased utility in the real world. To demonstrate that the model is capable of accelerating therapeutic development, we worked with Retro’s scientists, who used the model to re-engineer proteins relevant to their cell-reprogramming research program.

AI-assisted reengineering of SOX2 and KLF4 to increase stem cell reprogramming efficiency

The Yamanaka factors—OCT4, SOX2, KLF4, and MYC (OSKM)—are some of the most important proteins in regenerative biology today, and are named after Shinya Yamanaka, who discovered their ability to reprogram adult cells into pluripotent stem cells, a breakthrough that earned him the Nobel Prize in Physiology or Medicine in 2012. Unfortunately, they suffer from poor efficiency: typically less than 0.1% of cells convert during treatment, and the process can take three weeks or more. Efficiency drops further in cells from aged or diseased donors(opens in a new window), so finding more efficient variants remains an active and important research focus.

Directly optimizing the protein sequences is hard. SOX2 contains 317 amino acids and KLF4 513; the number of possible variants is on the order of 10^1000, so traditional “directed-evolution” screens that mutate a handful of residues at a time are able to explore only a miniscule fraction of the design space. A leading academic effort⁠(opens in a new window) tested a few thousand SOX2 mutants and found a handful of triple-mutants with a modest gain, while 15 years of work on chimeric SOX⁠(opens in a new window) proteins has yielded variants that differ from natural SOX constituents by only five residues.

The team at Retro built a wet lab screening platform using human fibroblast (skin and connective tissue) cells, initially validating it with baseline OSKM and SOX2 variants manually designed by Retro’s scientists as part of their pilot screen (Figure 3). Then, they asked GPT‑4b micro to propose a diverse set of “RetroSOX” sequences. Over 30% of the model’s suggestions in the screen outperformed wild‑type SOX2 at expressing key pluripotency markers, even though they differed by more than 100 amino acids on average. For comparison, in traditional screens(opens in a new window), hit rates below 10% are typical.

Combining the top RetroSOX and RetroKLF variants produced the largest gains. Across three independent experiments, fibroblasts showed a dramatic rise in both early (SSEA-4) and late (TRA-1-60, NANOG) markers, with late markers appearing several days sooner than under the wild-type OSKM cocktail (Figure 5).

In addition, the RetroSOX and RetroKLF variants were validated by alkaline phosphatase (AP) staining at day 10, confirming that the colonies not only express late-stage pluripotency markers but also exhibit robust AP activity indicative of pluripotency (Figure 6).

To further confirm the improved reprogramming efficiency and explore clinical potential, we tested a different delivery method (mRNA instead of viral vectors) and another cell type—mesenchymal stromal cells (MSCs)—derived from three middle-aged human donors (over 50 years old). Within just 7 days, more than 30% of the cells began expressing key pluripotency markers (SSEA4 and TRA-1-60), and by day 12, numerous colonies appeared with morphology similar to typical iPSCs (Figure 7, left and center). Over 85% of these cells activated endogenous expression of critical stem cell markers, including OCT4, NANOG, SOX2, and TRA-1-60.

We then verified that these RetroFactor-derived iPSCs could successfully differentiate into all three primary germ layers (endoderm, ectoderm, and mesoderm). Additionally, we expanded multiple monoclonal iPSC lines over several passages, confirming healthy karyotypes (Figure 7, right) and genomic stability suitable for cell therapies. These results consistently surpassed benchmarks obtained from conventional iPSC lines generated by contract research organizations using standard factors, further supporting the robustness of our engineered variants. Moreover, they provide evidence of enhanced iPSC generation across different delivery modalities and cell types.

Taken together, the high hit rates, deep sequence edits, accelerated marker onset, and AP+ colony formation provide early evidence that AI-guided protein design can substantially accelerate progress in stem cell reprogramming research.

Reengineered Variants Enhance DNA Damage Repair

Motivated by these results, we next investigated the rejuvenation potential of our re-engineered variants, specifically examining their ability to restore youthful characteristics to aged cells. We focus on DNA damage, which causes impaired cellular function and is a canonical hallmark of aging(opens in a new window). Earlier work(opens in a new window) has demonstrated that Yamanaka factors can erase DNA damage-related aging markers in cells derived from mice without fully reverting cell identity. We sought to find out whether our variants showed enhanced rejuvenation capabilities relative to baseline OSKM.

In our DNA‑damage assay, cells treated with the RetroSOX/KLF cocktail showed visibly less γ‑H2AX intensity—a marker of double‑strand breaks—than cells reprogrammed with standard OSKM or a fluorescent control (Figure 8). 

These results suggest that the RetroSOX/KLF cocktail reduces DNA damage more effectively than the original Yamanaka factors. By ameliorating one of the core hallmarks of cellular aging, the engineered variants offer a potential path toward improved cell rejuvenation and use in future therapies.

Where we go from here

To OpenAI, this work is an illustration of how quickly a domain-specific model can deliver breakthrough results on a focused scientific problem. “When researchers bring deep domain insight to our language-model tooling, problems that once took years can shift in days,” says Boris Power, who leads research partnerships at OpenAI. “We look forward to seeing what other advances emerge as more teams pair their expertise with the models we’re building.”