BMC Oral Health, cilt.26, sa.1, 2026 (SCI-Expanded, Scopus)
Background: This study aimed to perform the first multi-architecture comparison of pixel-level mandibular condyle segmentation on panoramic radiographs using transformer-based (RT-DETR), CNN-based (EfficientNet, Mask R-CNN, ConvNeXt), and YOLO-based (YOLOv9-Seg, YOLOv11-Seg) deep learning models. Methods: A dataset of 1,300 panoramic radiographs (2,600 condyles) was retrospectively curated. Ground-truth masks were annotated by a primary radiologist and reviewed by a senior radiologist; inter-observer agreement was quantified on a blinded 10% subset (Dice: 0.92 ± 0.03). Six state-of-the-art architectures were trained and evaluated on a fixed test set. Performance was assessed using Intersection over Union (IoU), Dice Similarity Coefficient (DSC), precision, recall, and F1-score. Results: All models achieved high segmentation accuracy, with DSC values ranging from 0.819 to 0.866. The transformer-based RT-DETR model showed the highest numerical DSC (0.866), IoU (0.764), and F1-score (0.866), indicating a balanced overall segmentation profile. Among the one-stage detectors, YOLOv9-Seg provided competitive results (DSC: 0.862) with high recall (0.902), outperforming CNN-based alternatives. YOLOv11-Seg showed high sensitivity but lower precision compared to other architectures. Conclusions: Deep learning enables accurate and automated condylar segmentation on panoramic radiographs. While RT-DETR showed favorable anatomical fidelity for quantitative morphometry, YOLOv9-Seg presented a viable real-time alternative. This study establishes a benchmark for selecting segmentation architectures tailored to specific clinical needs in TMJ analysis. Trial registration: Not applicable.