ADVANCED THEORY AND SIMULATIONS, cilt.9, sa.4, ss.1-13, 2026 (SCI-Expanded, Scopus)
The computational prediction of crystal synthesizability remains a major challenge in data-driven materials discovery, as mostentries in large materials databases correspond to theoretical structures without experimental validation. This asymmetrycreates a learning scenario in which only experimentally synthesized crystals are reliably labeled, while most structures remainunlabeled and ambiguous. In this work, crystal synthesizability prediction is formulated within a Positive–Unlabeled (PU) learningframework rather than as a conventional binary classification problem. We benchmark nine PU learning strategies combined withfour tree-based machine learning models, including Random Forest, LightGBM, XGBoost, and CatBoost, using approximately130 000 crystal structures from the Materials Project database. After physically consistent data preprocessing and feature selection,model performance is evaluated using discovery-oriented metrics, with Precision@200 as the primary criterion. The results showthat PU formulations outperform naïve classification approaches in early-stage discovery, enabling more reliable prioritizationof experimentally synthesizable materials. Model explanation analysis reveals that thermodynamic stability, formation energy,structural compactness, and magnetic descriptors dominate the learned decision mechanisms. Overall, this study provides acomparative evaluation of PU learning strategies and demonstrates their effectiveness as a discovery-oriented framework foridentifying experimentally viable crystal structures.