* Corresponding authors
a
Citrine Informatics, USA
E-mail:
bryce@citrine.io
b University of Chicago, USA
c Argonne National Laboratory, USA
d Stanford University, USA
e National Institute of Standards and Technology, USA
f SLAC National Accelerator Laboratory, USA
Abstract
Traditional machine learning (ML) metrics overestimate model performance for materials discovery. We introduce (1) leave-one-cluster-out cross-validation (LOCO CV) and (2) a simple nearest-neighbor benchmark to show that model performance in discovery applications strongly depends on the problem, data sampling, and extrapolation. Our results suggest that ML-guided iterative experimentation may outperform standard high-throughput screening for discovering breakthrough materials like high-Tc superconductors with ML.
- This article is part of the themed collections: MSDE most-read Q1 2019, 2018 MSDE Hot Articles and Machine Learning and Data Science in Materials Design
This article is Open Access
Please wait while we load your content...
Something went wrong. Try again?