Optimizing AI for surgical aftercare: Claude 3.5 Sonnet outperforms ChatGPT-5.0 in otoplasty


Creative Commons License

Sari E., Ahmadov N., Muradova A.

Egyptian Journal of Otolaryngology, vol.42, no.1, 2026 (ESCI, Scopus) identifier

  • Publication Type: Article / Article
  • Volume: 42 Issue: 1
  • Publication Date: 2026
  • Doi Number: 10.1186/s43163-026-01038-y
  • Journal Name: Egyptian Journal of Otolaryngology
  • Journal Indexes: Emerging Sources Citation Index (ESCI), Scopus
  • Keywords: Artificial intelligence, ChatGPT-5.0, Claude 3.5 Sonnet, Otoplasty, Postoperative care
  • Ankara University Affiliated: Yes

Abstract

Background: Artificial intelligence (AI) language models are increasingly used in surgical aftercare, yet their performance varies across platforms. The objective of this study is to compare the effectiveness of large language models in providing accurate, clinically relevant guidance for postoperative otoplasty. Methods: Ten commonly encountered postoperative otoplasty questions were presented to both models. The generated answers were independently assessed by ten ENT specialists using structured Likert-based instruments and predefined clinical evaluation. To evaluate reliability and inter-model differences, a range of advanced statistical techniques was applied, including t-tests, effect size calculations, sensitivity and specificity analyses, mixed-effects models, and regression-based modeling. Results: Claude 3.5 Sonnet outperformed ChatGPT-5.0 across all evaluation metrics (p < 0.001); mixed-effects modeling showed a positive model effect (β = 0.752), question-level ROC analysis demonstrated complete separation (AUC = 1.00), PCA supported a dominant single factor explaining 70.86% of variance in clinician ratings, and inter-rater agreement was higher for Claude 3.5 Sonnet. Conclusion: Claude 3.5 Sonnet model exhibited higher accuracy and clinical relevance in postoperative otoplasty management, with robust statistical validation supporting its reliability in surgical aftercare.