CrysMTM: a multiphase, temperature-resolved, multimodal dataset for crystalline materials


Polat C., Serpedin E., KURBAN M., Kurban H.

MACHINE LEARNING-SCIENCE AND TECHNOLOGY, cilt.6, sa.3, 2025 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 6 Sayı: 3
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1088/2632-2153/adf9bc
  • Dergi Adı: MACHINE LEARNING-SCIENCE AND TECHNOLOGY
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Ankara Üniversitesi Adresli: Evet

Özet

We present CrysMTM, a large-scale, multimodal dataset designed to benchmark temperature- and phase-sensitive machine learning models for crystalline materials. The dataset comprises approximately 30 000 atomistic samples covering the three primary polymorphs of titanium dioxide-anatase, brookite, and rutile-each evaluated across a temperature spectrum ranging from cryogenic to ambient and elevated conditions. Each data entry integrates three complementary modalities: (1) three-dimensional atomic coordinates, (2) RGBA molecular visualizations, and (3) structured textual metadata encompassing geometric descriptors, local bonding environments, and phase transformation parameters. This multimodal structure enables both supervised and self-supervised learning across graph-based, image-based, and language-based architectures. CrysMTM supports rigorous evaluation of model robustness under thermal perturbations and crystallographic phase transitions. Baseline benchmarking across 18 models-including graph neural networks (GNNs), convolutional neural networks, and foundation models-reveals significant property-specific challenges. For example, bandgap predictions exhibit errors exceeding 25%, while volumetric expansion and atomic displacement estimations frequently deviate by more than 100%. Even state-of-the-art GNNs, which achieve an average in-distribution (ID) mean absolute percentage error of approximately 20%, show a threefold increase under out-of-distribution (OOD) thermal conditions. In contrast, a few-shot multimodal large language model reduces global prediction error from 96% to 23% and narrows the performance gap between ID and OOD cases to just four percentage points. These results highlight both the selective difficulty posed by temperature-sensitive geometric targets and the considerable room for innovation in model design. All dataset files, model implementations, and pretrained checkpoints are available at https://github.com/KurbanIntelligenceLab/CrysMTM.