Investigating Tree-Based Models Across Positive–Unlabeled Learning Frameworks for Crystal Synthesizability Prediction

Aydın, AYHAN; Eryılmaz, Ümit; Alkan, Onur; Kocagöz, Pınar; Açıcı, KORAY; Ekinci, FATİH; Güzel, MEHMET

doi:10.1002/adts.70395

Investigating Tree-Based Models Across Positive–Unlabeled Learning Frameworks for Crystal Synthesizability Prediction

Aydın A., Eryılmaz Ü. K., Alkan O. B., Kocagöz P., Açıcı K., Ekinci F., ...Daha Fazla

ADVANCED THEORY AND SIMULATIONS, cilt.9, sa.4, ss.1-13, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 9 Sayı: 4
Basım Tarihi: 2026
Doi Numarası: 10.1002/adts.70395
Dergi Adı: ADVANCED THEORY AND SIMULATIONS
Derginin Tarandığı İndeksler: Scopus, Science Citation Index Expanded (SCI-EXPANDED), Compendex, INSPEC
Sayfa Sayıları: ss.1-13
Ankara Üniversitesi Adresli: Evet

Özet

The computational prediction of crystal synthesizability remains a major challenge in data-driven materials discovery, as mostentries in large materials databases correspond to theoretical structures without experimental validation. This asymmetrycreates a learning scenario in which only experimentally synthesized crystals are reliably labeled, while most structures remainunlabeled and ambiguous. In this work, crystal synthesizability prediction is formulated within a Positive–Unlabeled (PU) learningframework rather than as a conventional binary classification problem. We benchmark nine PU learning strategies combined withfour tree-based machine learning models, including Random Forest, LightGBM, XGBoost, and CatBoost, using approximately130 000 crystal structures from the Materials Project database. After physically consistent data preprocessing and feature selection,model performance is evaluated using discovery-oriented metrics, with Precision@200 as the primary criterion. The results showthat PU formulations outperform naïve classification approaches in early-stage discovery, enabling more reliable prioritizationof experimentally synthesizable materials. Model explanation analysis reveals that thermodynamic stability, formation energy,structural compactness, and magnetic descriptors dominate the learned decision mechanisms. Overall, this study provides acomparative evaluation of PU learning strategies and demonstrates their effectiveness as a discovery-oriented framework foridentifying experimentally viable crystal structures.