Landhuis, E. Scientific literature: information overload. Nature 535, 457–458 (2016).
Llanos, E. J. et al. Exploration of the chemical space and its three historical regimes. Proc. Natl Acad. Sci. USA 116, 12660–12665 (2019).
Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).
Bran, A. M. et al. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525–535 (2024).
Ramos, M. C., Collison, C. J. & White, A. D. A review of large language models and autonomous agents in chemistry. Chem. Sci. 16, 2514–2572 (2025).
Zhang, C. et al. SynAsk: unleashing the power of large language models in organic synthesis. Chem. Sci. 16, 43–56 (2025).
Dubey, A. et al. The Llama 3 herd of models. Preprint at http://arxiv.org/abs/2407.21783 (2024).
White, A. D. The future of chemistry is language. Nat. Rev. Chem. 7, 457–458 (2023).
Achiam, J. et al. GPT-4 technical report. Preprint at http://arxiv.org/abs/2303.08774 (2023).
Rinehart, N. I. et al. A machine-learning tool to predict substrate-adaptive conditions for Pd-catalyzed C–N couplings. Science 381, 965–972 (2023).
Li, S.-W. et al. Reaction performance prediction with an extrapolative and interpretable graph model based on chemical knowledge. Nat. Commun. 14, 3569 (2023).
Das, M., Ghosh, A. & Sunoj, R. B. Advances in machine learning with chemical language models in molecular property and reaction outcome predictions. J. Comput. Chem. 45, 1160–1176 (2024).
Gao, H. et al. Using machine learning to predict suitable conditions for organic reactions. ACS Cent. Sci. 4, 1465–1476 (2018).
Vaucher, A. C. et al. Inferring experimental procedures from text-based representations of chemical reactions. Nat. Commun. 12, 2573 (2021).
Chen, L., Zaharia, M. & Zou, J. How is ChatGPT’s behavior changing over time? Harvard Data Science Review https://hdsr.mitpress.mit.edu/pub/y95zitmz/release/2 (2024).
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).
Tu, Z., Stuyver, T. & Coley, C. W. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem. Sci. 14, 226–244 (2023).
Ruan, Y. et al. An automatic end-to-end chemical synthesis development platform powered by large language models. Nat. Commun. 15, 10160 (2024).
Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 365, eaax1566 (2019).
Hu, E. J. et al. Lora: low-rank adaptation of large language models. ICLR 1, 3 (2022).
Jablonka, K. M., Schwaller, P., Ortega-Guerrero, A. & Smit, B. Leveraging large language models for predictive chemistry. Nat. Mach. Intell. 6, 161–169 (2024).
Zdrazil, B. & Guha, R. The rise and fall of a scaffold: a trend analysis of scaffolds in the medicinal chemistry literature. J. Med. Chem. 61, 4688–4703 (2017).
Lu, W. et al. Enhanced ligand discovery through generative AI and latent-space exploration: application to the Mizoroki-Heck reaction. Preprint at https://chemrxiv.org/engage/chemrxiv/article-details/65cfb263e9ebbb4db9859eb7 (2024).
Hoveyda, A. H., Malcolmson, S. J., Meek, S. J. & Zhugralin, A. R. Catalytic enantioselective olefin metathesis in natural product synthesis. Angew. Chem. Int. Ed. 49, 34–44 (2010).
Sinclair, F., Alkattan, M., Prunet, J. & Shaver, M. P. Olefin cross metathesis and ring-closing metathesis in polymer chemistry. Polym. Chem. 8, 3385–3398 (2017).
Gleiter, R. & Werz, D. B. Alkynes between main group elements: from dumbbells via rods to squares and tubes. Chem. Rev. 110, 4447–4488 (2010).
Chinchilla, R. & Nájera, C. The Sonogashira reaction: a booming methodology in synthetic organic chemistry. Chem. Rev. 107, 874–922 (2007).
Chen, T. et al. Diaryl ether: a privileged scaffold for drug and agrochemical discovery. J. Agric. Food Chem. 68, 9839–9877 (2020).
McConnell, J. R., Hitt, J. E., Daugs, E. D. & Rey, T. A. The Swern oxidation: development of a high-temperature semicontinuous process. Org. Process Res. Dev. 12, 940–945 (2008).
Bratko, I. et al. Triazolium salts as appropriate catalytic scaffolds for 1,4-additions to α,β-unsaturated carbonyls. Eur. J. Org. Chem. 2014, 2160–2167 (2014).
Roman, D., Sauer, M. & Beemelmanns, C. Applications of the Horner–Wadsworth–Emmons olefination in modern natural product synthesis. Synthesis 53, 2713–2739 (2021).
Palsuledesai, C. C. & Distefano, M. D. Protein prenylation: enzymes, therapeutics, and biotechnology applications. ACS Chem. Biol. 10, 51–62 (2015).
Castellino, N. J., Montgomery, A. P., Danon, J. J. & Kassiou, M. Late-stage functionalization for improving drug-like molecular properties. Chem. Rev. 123, 8127–8153 (2023).
Kaes, C., Katz, A. & Hosseini, M. W. Bipyridine: the most widely used ligand. A review of molecules comprising at least two 2,2′-bipyridine units. Chem. Rev. 100, 3553–3590 (2000).
Treat, N. J. et al. Metal-free atom transfer radical polymerization. J. Am. Chem. Soc. 136, 16096–16101 (2014).
Beaujuge, P. M. & Reynolds, J. R. Color control in π-conjugated organic polymers for use in electrochromic devices. Chem. Rev. 110, 268–320 (2010).
Park, S.-Y. et al. Abscisic acid inhibits type 2C protein phosphatases via the PYR/PYL family of START proteins. Science 324, 1068–1071 (2009).
Kim, H. et al. Synthesis and in vitro biological activity of retinyl retinoate, a novel hybrid retinoid derivative. Bioorg. Med. Chem. 16, 6387–6393 (2008).
Jensen, T., Pedersen, H., Bang-Andersen, B., Madsen, R. & Jørgensen, M. Palladium-catalyzed aryl amination–Heck cyclization cascade: a one-flask approach to 3-substituted indoles. Angew. Chem. Int. Ed. 47, 888–890 (2008).
Fan, R., Wen, H., Chen, Z., Xia, Y. & Fang, W. A general protocol toward synthesis of 3-methylindoles using acenaphthoimidazolyidene-ligated oxazoline palladacycle. Org. Lett. 26, 22–28 (2024).
Ahneman, D. T. et al. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).
Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019).
Li, H. et al. Kernel-elastic autoencoder for molecular design. PNAS Nexus 3, 168 (2024).
Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. 35th International Conference on Machine Learning 80, 2323–2332 (PMLR, 2018).
Probst, D. & Reymond, J.-L. Visualization of very large high-dimensional data sets as minimum spanning trees. J. Cheminform. 12, 12 (2020).
Douze, M. et al. The FAISS library. Preprint at http://arxiv.org/abs/2401.08281 (2025).
Zhao, Y. et al. PyTorch FSDP: experiences on scaling fully sharded data parallel. Proc. VLDB Endow. 16, 3848–3860 (2023).
Rajbhandari, S. et al. ZeRO-infinity: breaking the GPU memory wall for extreme scale deep learning. In Proc. International Conference for High Performance Computing, Networking, Storage and Analysis 59 (ACM, 2021).
Zobel, J. & Moffat, A. Inverted files for text search engines. ACM Comput. Surv. 38, 6-es (2006).
Andronov, M. et al. Reagent prediction with a molecular transformer improves reaction data quality. Chem. Sci. 14, 3235–3246 (2023).
Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).
Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017).
Li, H. MOSAIC: Multiple Optimized Specialists for AI-assisted Chemical Prediction. Zenodo https://doi.org/10.5281/zenodo.18002953 (2025).
Sarkar, S., Ghosh, S., Kurandina, D., Noffel, Y. & Gevorgyan, V. Enhanced excited-state hydricity of Pd–H allows for unusual head-to-tail hydroalkenylation of alkenes. J. Am. Chem. Soc. 145, 12224–12232 (2023).
Bodnar, A. K. & Newhouse, T. R. Accessing Z-enynes via cobalt-catalyzed propargylic dehydrogenation. Angew. Chem. Int. Ed. 63, e202402638 (2024).
Geunes, E. P., Meinhardt, J. M., Wu, E. J. & Knowles, R. R. Photocatalytic anti-Markovnikov hydroamination of alkenes with primary heteroaryl amines. J. Am. Chem. Soc. 145, 21738–21744 (2023).
Ratushnyy, M., Kvasovs, N., Sarkar, S. & Gevorgyan, V. Visible-light-induced palladium-catalyzed generation of aryl radicals from aryl triflates. Angew. Chem. Int. Ed. 59, 10316–10320 (2020).
Xie, K. A. et al. A unified method for oxidative and reductive decarboxylative arylation with orange light-driven Ir/Ni metallaphotoredox catalysis. J. Am. Chem. Soc. 146, 25780–25787 (2024).