A comparative analysis of single- and dual-backbone deep learning architectures with explainable AI for cherry leaf disease classification

Altay, Hüseyin; Demir, Özge; Ekinci, FATİH; Güzel, Mehmet; Kumru, Eda; Akata, Ilgaz; Acıcı, Koray; Sevindik, Mustafa

doi:10.1038/s41598-026-50104-1

A comparative analysis of single- and dual-backbone deep learning architectures with explainable AI for cherry leaf disease classification

Altay H. T., Demir Ö., Ekinci F., Güzel M. S., Kumru E., Akata I., ...Daha Fazla

SCIENTIFIC REPORTS, cilt.1, sa.1, ss.1-41, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 1 Sayı: 1
Basım Tarihi: 2026
Doi Numarası: 10.1038/s41598-026-50104-1
Dergi Adı: SCIENTIFIC REPORTS
Derginin Tarandığı İndeksler: Scopus, Science Citation Index Expanded (SCI-EXPANDED), BIOSIS, Chemical Abstracts Core, MEDLINE, Directory of Open Access Journals
Sayfa Sayıları: ss.1-41
Ankara Üniversitesi Adresli: Evet

Özet

Accurate differentiation of visually similar cherry leaf diseases remains a major challenge in precision agriculture due to overlapping symptom patterns and environmental variability. This study presents a comprehensive deep learning–based framework for multi-class cherry leaf disease classification, integrating systematic architectural comparison, statistical validation, and explainable artificial intelligence (XAI) analysis. Contrary to the common assumption that increased architectural complexity enhances performance, our results show that dual-backbone architectures consistently fail to outperform single-backbone models. A dataset comprising 4,995 cherry leaf images across five categories—brown spot, leaf scorch, healthy leaf, purple leaf spot, and shot hole disease—was used to evaluate multiple convolutional neural network architectures under fully standardized conditions. ResNet50 achieved the highest classification accuracy (98.20%), followed by EfficientNetB2 (98.00%) and DenseNet121 (97.50%), while the best dual-backbone model reached only 97.30% despite increased complexity. Statistical analysis using the Wilcoxon signed-rank test revealed a significant discrepancy between overall accuracy and macro-averaged recall (p = 0.00195, r = 0.89), demonstrating that accuracy systematically overestimates class-wise detection performance in multi-class scenarios. Grad-CAM–based explainability analysis further revealed that DenseNet-based models produce compact and semantically coherent activation maps aligned with disease-relevant regions, whereas dual-backbone architectures exhibit fragmented attention patterns associated with feature redundancy and gradient interference. These findings indicate that interpretability fidelity does not scale with architectural complexity and that coherent single-backbone feature hierarchies provide a superior balance between performance, interpretability, and generalization. The proposed framework offers both methodological and practical insights for developing reliable and scalable artificial intelligence systems in agricultural disease diagnostics.