Collective intelligence for AI-assisted chemical synthesis

8 min read Original article ↗
  • Landhuis, E. Scientific literature: information overload. Nature 535, 457–458 (2016).

    Article  ADS  PubMed  Google Scholar 

  • Llanos, E. J. et al. Exploration of the chemical space and its three historical regimes. Proc. Natl Acad. Sci. USA 116, 12660–12665 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  • Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  • Bran, A. M. et al. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525–535 (2024).

    Article  Google Scholar 

  • Ramos, M. C., Collison, C. J. & White, A. D. A review of large language models and autonomous agents in chemistry. Chem. Sci. 16, 2514–2572 (2025).

    Article  CAS  PubMed  Google Scholar 

  • Zhang, C. et al. SynAsk: unleashing the power of large language models in organic synthesis. Chem. Sci. 16, 43–56 (2025).

    Article  CAS  Google Scholar 

  • Dubey, A. et al. The Llama 3 herd of models. Preprint at http://arxiv.org/abs/2407.21783 (2024).

  • White, A. D. The future of chemistry is language. Nat. Rev. Chem. 7, 457–458 (2023).

    Article  CAS  PubMed  Google Scholar 

  • Achiam, J. et al. GPT-4 technical report. Preprint at http://arxiv.org/abs/2303.08774 (2023).

  • Rinehart, N. I. et al. A machine-learning tool to predict substrate-adaptive conditions for Pd-catalyzed C–N couplings. Science 381, 965–972 (2023).

    Article  ADS  CAS  PubMed  Google Scholar 

  • Li, S.-W. et al. Reaction performance prediction with an extrapolative and interpretable graph model based on chemical knowledge. Nat. Commun. 14, 3569 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  • Das, M., Ghosh, A. & Sunoj, R. B. Advances in machine learning with chemical language models in molecular property and reaction outcome predictions. J. Comput. Chem. 45, 1160–1176 (2024).

    Article  ADS  CAS  PubMed  Google Scholar 

  • Gao, H. et al. Using machine learning to predict suitable conditions for organic reactions. ACS Cent. Sci. 4, 1465–1476 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Vaucher, A. C. et al. Inferring experimental procedures from text-based representations of chemical reactions. Nat. Commun. 12, 2573 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  • Chen, L., Zaharia, M. & Zou, J. How is ChatGPT’s behavior changing over time? Harvard Data Science Review https://hdsr.mitpress.mit.edu/pub/y95zitmz/release/2 (2024).

  • Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).

    Article  CAS  Google Scholar 

  • Tu, Z., Stuyver, T. & Coley, C. W. Predictive chemistry: machine learning for reaction deployment, reaction development, and reaction discovery. Chem. Sci. 14, 226–244 (2023).

    Article  CAS  PubMed  Google Scholar 

  • Ruan, Y. et al. An automatic end-to-end chemical synthesis development platform powered by large language models. Nat. Commun. 15, 10160 (2024).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  • Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 365, eaax1566 (2019).

    Article  CAS  PubMed  Google Scholar 

  • Hu, E. J. et al. Lora: low-rank adaptation of large language models. ICLR 1, 3 (2022).

  • Jablonka, K. M., Schwaller, P., Ortega-Guerrero, A. & Smit, B. Leveraging large language models for predictive chemistry. Nat. Mach. Intell. 6, 161–169 (2024).

    Article  Google Scholar 

  • Zdrazil, B. & Guha, R. The rise and fall of a scaffold: a trend analysis of scaffolds in the medicinal chemistry literature. J. Med. Chem. 61, 4688–4703 (2017).

    Article  PubMed  Google Scholar 

  • Lu, W. et al. Enhanced ligand discovery through generative AI and latent-space exploration: application to the Mizoroki-Heck reaction. Preprint at https://chemrxiv.org/engage/chemrxiv/article-details/65cfb263e9ebbb4db9859eb7 (2024).

  • Hoveyda, A. H., Malcolmson, S. J., Meek, S. J. & Zhugralin, A. R. Catalytic enantioselective olefin metathesis in natural product synthesis. Angew. Chem. Int. Ed. 49, 34–44 (2010).

    Article  CAS  Google Scholar 

  • Sinclair, F., Alkattan, M., Prunet, J. & Shaver, M. P. Olefin cross metathesis and ring-closing metathesis in polymer chemistry. Polym. Chem. 8, 3385–3398 (2017).

    Article  CAS  Google Scholar 

  • Gleiter, R. & Werz, D. B. Alkynes between main group elements: from dumbbells via rods to squares and tubes. Chem. Rev. 110, 4447–4488 (2010).

    Article  CAS  PubMed  Google Scholar 

  • Chinchilla, R. & Nájera, C. The Sonogashira reaction: a booming methodology in synthetic organic chemistry. Chem. Rev. 107, 874–922 (2007).

    Article  CAS  PubMed  Google Scholar 

  • Chen, T. et al. Diaryl ether: a privileged scaffold for drug and agrochemical discovery. J. Agric. Food Chem. 68, 9839–9877 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  • McConnell, J. R., Hitt, J. E., Daugs, E. D. & Rey, T. A. The Swern oxidation: development of a high-temperature semicontinuous process. Org. Process Res. Dev. 12, 940–945 (2008).

    Article  CAS  Google Scholar 

  • Bratko, I. et al. Triazolium salts as appropriate catalytic scaffolds for 1,4-additions to α,β-unsaturated carbonyls. Eur. J. Org. Chem. 2014, 2160–2167 (2014).

    Article  CAS  Google Scholar 

  • Roman, D., Sauer, M. & Beemelmanns, C. Applications of the Horner–Wadsworth–Emmons olefination in modern natural product synthesis. Synthesis 53, 2713–2739 (2021).

    Article  CAS  Google Scholar 

  • Palsuledesai, C. C. & Distefano, M. D. Protein prenylation: enzymes, therapeutics, and biotechnology applications. ACS Chem. Biol. 10, 51–62 (2015).

    Article  CAS  PubMed  Google Scholar 

  • Castellino, N. J., Montgomery, A. P., Danon, J. J. & Kassiou, M. Late-stage functionalization for improving drug-like molecular properties. Chem. Rev. 123, 8127–8153 (2023).

    Article  CAS  PubMed  Google Scholar 

  • Kaes, C., Katz, A. & Hosseini, M. W. Bipyridine: the most widely used ligand. A review of molecules comprising at least two 2,2′-bipyridine units. Chem. Rev. 100, 3553–3590 (2000).

    Article  CAS  PubMed  Google Scholar 

  • Treat, N. J. et al. Metal-free atom transfer radical polymerization. J. Am. Chem. Soc. 136, 16096–16101 (2014).

    Article  ADS  CAS  PubMed  Google Scholar 

  • Beaujuge, P. M. & Reynolds, J. R. Color control in π-conjugated organic polymers for use in electrochromic devices. Chem. Rev. 110, 268–320 (2010).

    Article  CAS  PubMed  Google Scholar 

  • Park, S.-Y. et al. Abscisic acid inhibits type 2C protein phosphatases via the PYR/PYL family of START proteins. Science 324, 1068–1071 (2009).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  • Kim, H. et al. Synthesis and in vitro biological activity of retinyl retinoate, a novel hybrid retinoid derivative. Bioorg. Med. Chem. 16, 6387–6393 (2008).

    Article  CAS  PubMed  Google Scholar 

  • Jensen, T., Pedersen, H., Bang-Andersen, B., Madsen, R. & Jørgensen, M. Palladium-catalyzed aryl amination–Heck cyclization cascade: a one-flask approach to 3-substituted indoles. Angew. Chem. Int. Ed. 47, 888–890 (2008).

    Article  CAS  Google Scholar 

  • Fan, R., Wen, H., Chen, Z., Xia, Y. & Fang, W. A general protocol toward synthesis of 3-methylindoles using acenaphthoimidazolyidene-ligated oxazoline palladacycle. Org. Lett. 26, 22–28 (2024).

    Article  ADS  CAS  PubMed  Google Scholar 

  • Ahneman, D. T. et al. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).

    Article  ADS  CAS  PubMed  Google Scholar 

  • Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571, 95–98 (2019).

    Article  ADS  CAS  PubMed  Google Scholar 

  • Li, H. et al. Kernel-elastic autoencoder for molecular design. PNAS Nexus 3, 168 (2024).

    Article  Google Scholar 

  • Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. 35th International Conference on Machine Learning 80, 2323–2332 (PMLR, 2018).

  • Probst, D. & Reymond, J.-L. Visualization of very large high-dimensional data sets as minimum spanning trees. J. Cheminform. 12, 12 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  • Douze, M. et al. The FAISS library. Preprint at http://arxiv.org/abs/2401.08281 (2025).

  • Zhao, Y. et al. PyTorch FSDP: experiences on scaling fully sharded data parallel. Proc. VLDB Endow. 16, 3848–3860 (2023).

  • Rajbhandari, S. et al. ZeRO-infinity: breaking the GPU memory wall for extreme scale deep learning. In Proc. International Conference for High Performance Computing, Networking, Storage and Analysis 59 (ACM, 2021).

  • Zobel, J. & Moffat, A. Inverted files for text search engines. ACM Comput. Surv. 38, 6-es (2006).

    Article  Google Scholar 

  • Andronov, M. et al. Reagent prediction with a molecular transformer improves reaction data quality. Chem. Sci. 14, 3235–3246 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).

    Google Scholar 

  • Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017).

    Google Scholar 

  • Li, H. MOSAIC: Multiple Optimized Specialists for AI-assisted Chemical Prediction. Zenodo https://doi.org/10.5281/zenodo.18002953 (2025).

  • Sarkar, S., Ghosh, S., Kurandina, D., Noffel, Y. & Gevorgyan, V. Enhanced excited-state hydricity of Pd–H allows for unusual head-to-tail hydroalkenylation of alkenes. J. Am. Chem. Soc. 145, 12224–12232 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  • Bodnar, A. K. & Newhouse, T. R. Accessing Z-enynes via cobalt-catalyzed propargylic dehydrogenation. Angew. Chem. Int. Ed. 63, e202402638 (2024).

    Article  CAS  Google Scholar 

  • Geunes, E. P., Meinhardt, J. M., Wu, E. J. & Knowles, R. R. Photocatalytic anti-Markovnikov hydroamination of alkenes with primary heteroaryl amines. J. Am. Chem. Soc. 145, 21738–21744 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  • Ratushnyy, M., Kvasovs, N., Sarkar, S. & Gevorgyan, V. Visible-light-induced palladium-catalyzed generation of aryl radicals from aryl triflates. Angew. Chem. Int. Ed. 59, 10316–10320 (2020).

    Article  CAS  Google Scholar 

  • Xie, K. A. et al. A unified method for oxidative and reductive decarboxylative arylation with orange light-driven Ir/Ni metallaphotoredox catalysis. J. Am. Chem. Soc. 146, 25780–25787 (2024).

    Article  ADS  CAS  PubMed  Google Scholar