Evaluating ChatGPT responses to patient-oriented questions on one-stage revision arthroplasty for periprosthetic joint infection

Terzi, Mustafa; KOCAOĞLU, HAKAN; ÇINAR, GÜLE; Sandiford, N.Amir; Citak, Mustafa

doi:10.1007/s00264-026-06806-2

Evaluating ChatGPT responses to patient-oriented questions on one-stage revision arthroplasty for periprosthetic joint infection

Terzi M. M., KOCAOĞLU H., ÇINAR G., Sandiford N., Citak M.

International Orthopaedics, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Basım Tarihi: 2026
Doi Numarası: 10.1007/s00264-026-06806-2
Dergi Adı: International Orthopaedics
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Abstracts in Social Gerontology, CINAHL, EMBASE, MEDLINE
Anahtar Kelimeler: ChatGPT, Large language model, One-stage revision, Patient education, Periprosthetic joint infection, Revision arthroplasty
Ankara Üniversitesi Adresli: Evet

Özet

Background: Large language model based chatbots are increasingly used by patients seeking information about periprosthetic joint infection (PJI) and revision strategies, yet the quality of patient-facing answers for one-stage revision remains uncertain. Methods: This expert based, cross sectional exploratory study evaluated ChatGPT generated answers to 12 patient-oriented questions on one-stage revision arthroplasty for PJI. Questions were purposively selected from a pool of 30 commonly asked items to maximize topic coverage while minimizing redundancy. All questions were entered ad verbatim into ChatGPT on January 3, 2026 (freely accessible web interface; GPT-5.2), using a new session per question; only the first response was recorded without follow-up prompts or browsing. Four raters (two senior orthopaedic surgeons, one junior orthopaedic surgeon, one infectious diseases specialist) independently graded each response using a predefined ordinal rubric and recorded brief comments when limitations were identified. Inter-rater reliability was assessed using Krippendorff’s alpha (ordinal). Results: Responses were rated positively overall. Answers addressing procedural steps, postoperative antibiotic management, and recovery expectations received the most consistently high ratings. Clarification was most frequently requested in domains where decision making is conditional or evolving, including indications/contraindications and patient selection, culture-negative PJI framing, protocol dependent weight bearing recommendations, and management options after failure. Inter-rater agreement was modest (α = 0.375; 95% CI − 0.012 to 0.651). Conclusion: In this exploratory study, ChatGPT provided generally clear patient-facing explanations for one-stage revision PJI topics, with stronger performance in standardized and procedural domains and more limited performance in areas requiring individualized clinical judgment. These findings suggest that it may have a supportive role in patient education, although clinician oversight remains important to ensure appropriate contextualization and to avoid overgeneralization.