Can machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery

Author affiliations

* Corresponding authors

^a Citrine Informatics, USA
E-mail: bryce@citrine.io

^b University of Chicago, USA

^c Argonne National Laboratory, USA

^d Stanford University, USA

^e National Institute of Standards and Technology, USA

^f SLAC National Accelerator Laboratory, USA

Abstract

Traditional machine learning (ML) metrics overestimate model performance for materials discovery. We introduce (1) leave-one-cluster-out cross-validation (LOCO CV) and (2) a simple nearest-neighbor benchmark to show that model performance in discovery applications strongly depends on the problem, data sampling, and extrapolation. Our results suggest that ML-guided iterative experimentation may outperform standard high-throughput screening for discovering breakthrough materials like high-T_c superconductors with ML.

Graphical abstract: Can machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery

This article is part of the themed collections: MSDE most-read Q1 2019, 2018 MSDE Hot Articles and Machine Learning and Data Science in Materials Design

This article is Open Access

Please wait while we load your content... Something went wrong. Try again?

Article information

DOI

https://doi.org/10.1039/C8ME00012C

Article type

Communication

Submitted

05 Mar 2018

Accepted

11 Jul 2018

First published

17 Aug 2018

This article is Open Access

Download Citation

Mol. Syst. Des. Eng., 2018,3, 819-825

Permissions