Comparative analysis of transformer, CNN, and YOLO architectures for mandibular condyle segmentation on panoramic radiographs: a deep learning benchmark

Yilmaz, Serkan; Ozturk, Hilal; Ozgedik, Hatice; Avsever, Ismail; Senel, Bugra; Tasyurek, Murat; KURT, MEHMET

doi:10.1186/s12903-026-08228-3

Comparative analysis of transformer, CNN, and YOLO architectures for mandibular condyle segmentation on panoramic radiographs: a deep learning benchmark

Yilmaz S., Ozturk H. P., Ozgedik H. S., Avsever I. H., Senel B., Tasyurek M., ...Daha Fazla

BMC Oral Health, cilt.26, sa.1, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 26 Sayı: 1
Basım Tarihi: 2026
Doi Numarası: 10.1186/s12903-026-08228-3
Dergi Adı: BMC Oral Health
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, CINAHL, MEDLINE, Directory of Open Access Journals
Anahtar Kelimeler: Artificial Intelligence, Deep Learning, Mandibular Condyle, Panoramic Radiography, Semantic Segmentation, Transformers
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Ankara Üniversitesi Adresli: Evet

Özet

Background: This study aimed to perform the first multi-architecture comparison of pixel-level mandibular condyle segmentation on panoramic radiographs using transformer-based (RT-DETR), CNN-based (EfficientNet, Mask R-CNN, ConvNeXt), and YOLO-based (YOLOv9-Seg, YOLOv11-Seg) deep learning models. Methods: A dataset of 1,300 panoramic radiographs (2,600 condyles) was retrospectively curated. Ground-truth masks were annotated by a primary radiologist and reviewed by a senior radiologist; inter-observer agreement was quantified on a blinded 10% subset (Dice: 0.92 ± 0.03). Six state-of-the-art architectures were trained and evaluated on a fixed test set. Performance was assessed using Intersection over Union (IoU), Dice Similarity Coefficient (DSC), precision, recall, and F1-score. Results: All models achieved high segmentation accuracy, with DSC values ranging from 0.819 to 0.866. The transformer-based RT-DETR model showed the highest numerical DSC (0.866), IoU (0.764), and F1-score (0.866), indicating a balanced overall segmentation profile. Among the one-stage detectors, YOLOv9-Seg provided competitive results (DSC: 0.862) with high recall (0.902), outperforming CNN-based alternatives. YOLOv11-Seg showed high sensitivity but lower precision compared to other architectures. Conclusions: Deep learning enables accurate and automated condylar segmentation on panoramic radiographs. While RT-DETR showed favorable anatomical fidelity for quantitative morphometry, YOLOv9-Seg presented a viable real-time alternative. This study establishes a benchmark for selecting segmentation architectures tailored to specific clinical needs in TMJ analysis. Trial registration: Not applicable.