Stress-Testing Multimodal Foundation Models for Crystallographic Reasoning


Polat C., Kurban H., Serpedin E., KURBAN M.

3rd Workshop on Towards Knowledgeable Foundation Models-KnowFM, Vienna, Avusturya, 01 Ağustos 2025, ss.49-58, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Basıldığı Şehir: Vienna
  • Basıldığı Ülke: Avusturya
  • Sayfa Sayıları: ss.49-58
  • Ankara Üniversitesi Adresli: Hayır

Özet

Evaluating foundation models for crystallographic reasoning requires benchmarks that isolate generalization behavior while enforcing physical constraints. This work introduces, xCrysAlloys, a multiscale multicrystal dataset with two physically grounded evaluation protocols to stress-test multimodal generative models. The Spatial-Exclusion benchmark withholds all supercells of a given radius from a diverse dataset, enabling controlled assessments of spatial interpolation and extrapolation. The Compositional-Exclusion benchmark omits all samples of a specific chemical composition, probing generalization across stoichiometries. Nine vision-language foundation models are prompted with crystallographic images and textual context to generate structural annotations. Responses are evaluated via (i) relative errors in lattice parameters and density, (ii) a physics-consistency index penalizing volumetric violations, and (iii) a hallucination score capturing geometric outliers and invalid space-group predictions. These benchmarks establish a reproducible, physically informed framework for assessing generalization, consistency, and reliability in large-scale multimodal models. Dataset and implementation are available at https://github. com/KurbanIntelligenceLab/ StressTestingMMFMinCR.