posted on 2023-11-29, 18:07authored byChristopher Sutton, Mario Boley, Luca M. Ghiringhelli, Matthias Rupp, Jilles VreekenJilles Vreeken, Matthias Scheffler
Although machine learning (ML) models promise to substantially accelerate the discovery of
novel materials, their performance is often still insufficient to draw reliable conclusions.
Improved ML models are therefore actively researched, but their design is currently guided
mainly by monitoring the average model test error. This can render different models indistinguishable
although their performance differs substantially across materials, or it can make
a model appear generally insufficient while it actually works well in specific sub-domains.
Here, we present a method, based on subgroup discovery, for detecting domains of applicability
(DA) of models within a materials class. The utility of this approach is demonstrated
by analyzing three state-of-the-art ML models for predicting the formation energy of
transparent conducting oxides. We find that, despite having a mutually indistinguishable and
unsatisfactory average error, the models have DAs with distinctive features and notably
improved performance.
History
Preferred Citation
Christopher Sutton, Mario Boley, Luca Ghiringhelli, Matthias Rupp, Jilles Vreeken and Matthias Scheffler. Identifying Domains of Applicability of Machine Learning Models for Materials Science. In: Nature Communications. 2020.
Primary Research Area
Trustworthy Information Processing
Legacy Posted Date
2020-10-15
Journal
Nature Communications
Open Access Type
Gold
Sub Type
Article
BibTeX
@article{cispa_all_3252,
title = "Identifying Domains of Applicability of Machine Learning Models for Materials Science",
author = "Sutton, Christopher and Boley, Mario and Ghiringhelli, Luca M. and Rupp, Matthias and Vreeken, Jilles and Scheffler, Matthias",
journal="{Nature Communications}",
year="2020",
}