Dresden 2026 – scientific programme
Parts | Days | Selection | Search | Updates | Downloads | Help
MM: Fachverband Metall- und Materialphysik
MM 13: Data-driven Materials Science: Big Data and Workflows I
MM 13.6: Talk
Tuesday, March 10, 2026, 11:45–12:00, SCH/A251
Efficient Exploration of the Unknown: Distance-Based Active Learning with SISSO Descriptors and Mendeleev Similarities for Materials Discovery — •Sreejani Karmakar1, Akhil S. Nair1,2, Lucas Foppa1, and Matthias Scheffler1 — 1Fritz Haber Institute of the Max Planck Society, Berlin, Germany — 2Freie Universität Berlin, Berlin, Germany
The performance of AI models depends strongly on the distribution of their training data, which ideally should be independent and identically distributed. Materials-science datasets often violate this condition, containing redundancy and bias that hinder the discovery of statistically rare high-performance materials. Active learning (AL) helps by building concise, diverse training sets, introducing underrepresented materials classes. Commonly, AL relies on uncertainty estimates derived from the variance of model ensembles[1], but these are frequently overconfident and limit AL efficiency. We introduce an alternative strategy that selects candidate materials based on their distance from the existing training set in a low-dimensional descriptor space[2]. These descriptors, derived via SISSO (sure independence screening and sparsifying operator) symbolic regression approach. This distance-guided approach outperforms ensemble-based uncertainty AL, successfully identifying perovskites with exceptional properties. Adding the Mendeleev similarity metric further improves dataset diversity and supports efficient navigation of unexplored material space.
[1] A. Nair et al., npj Comput. Mater., 11, 150, 2025. [2] D. Wu et al., Inf. Sci., 474, 90, 105, 2019.
Keywords: active learning; symbolic regression; materials discovery; perovskites
