Evaluating ChatGPT responses to patient-oriented questions on one-stage revision arthroplasty for periprosthetic joint infection


Terzi M. M., KOCAOĞLU H., ÇINAR G., Sandiford N., Citak M.

International Orthopaedics, 2026 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1007/s00264-026-06806-2
  • Dergi Adı: International Orthopaedics
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Abstracts in Social Gerontology, CINAHL, EMBASE, MEDLINE
  • Anahtar Kelimeler: ChatGPT, Large language model, One-stage revision, Patient education, Periprosthetic joint infection, Revision arthroplasty
  • Ankara Üniversitesi Adresli: Evet

Özet

Background: Large language model based chatbots are increasingly used by patients seeking information about periprosthetic joint infection (PJI) and revision strategies, yet the quality of patient-facing answers for one-stage revision remains uncertain. Methods: This expert based, cross sectional exploratory study evaluated ChatGPT generated answers to 12 patient-oriented questions on one-stage revision arthroplasty for PJI. Questions were purposively selected from a pool of 30 commonly asked items to maximize topic coverage while minimizing redundancy. All questions were entered ad verbatim into ChatGPT on January 3, 2026 (freely accessible web interface; GPT-5.2), using a new session per question; only the first response was recorded without follow-up prompts or browsing. Four raters (two senior orthopaedic surgeons, one junior orthopaedic surgeon, one infectious diseases specialist) independently graded each response using a predefined ordinal rubric and recorded brief comments when limitations were identified. Inter-rater reliability was assessed using Krippendorff’s alpha (ordinal). Results: Responses were rated positively overall. Answers addressing procedural steps, postoperative antibiotic management, and recovery expectations received the most consistently high ratings. Clarification was most frequently requested in domains where decision making is conditional or evolving, including indications/contraindications and patient selection, culture-negative PJI framing, protocol dependent weight bearing recommendations, and management options after failure. Inter-rater agreement was modest (α = 0.375; 95% CI − 0.012 to 0.651). Conclusion: In this exploratory study, ChatGPT provided generally clear patient-facing explanations for one-stage revision PJI topics, with stronger performance in standardized and procedural domains and more limited performance in areas requiring individualized clinical judgment. These findings suggest that it may have a supportive role in patient education, although clinician oversight remains important to ensure appropriate contextualization and to avoid overgeneralization.