International Journal of Retina and Vitreous, vol.12, no.1, 2026 (ESCI, Scopus)
Background: To evaluate the accuracy and readability of answers to common retinitis pigmentosa (RP) questions from the popular generative artificial intelligence (AI) chatbots ChatGPT-4 and Gemini-2.0. Methods: In March 2025, frequently asked questions about RP was entered to Google search tool, and the websites appearing on the first search page were selected for enrollment in the study. ChatGPT-4 and Gemini-2.0 were then prompted to generate responses about RP in both standard and simplified formats. To generate the simplified response, the following request was added to the prompt: ‘Please provide a response suitable for the average American adult, at a sixth-grade comprehension level.’ The AI chatbots’ responses to 30 questions about RP, frequently asked by patients, were evaluated by two ophthalmologists using a five-point Likert scale, with scores ranging from 1–5. Additionally, 8 readability indices, including Average Reading Level Consensus Calculator (ARLC), Automated Readability Index (ARI), Flesch Reading Ease (FRE), Gunning Fog Index (GFOG), Flesch–Kincaid Grade Level (FKGL), Coleman–Liau Index (CL), Simple Measure of Gobbledygook (SMOG), and Forcast Readability Formula (FRF) were calculated using an online calculator, Readabilityformulas.com, to assess the ease of comprehension of each answer. Results: No significant difference showed in accuracy both standard and simplified AI chatbot responses (p = 0.557, p = 0.090). In particular, almost all readability indices suggest that standard AI chatbot responses require a higher level of education for comprehension, whereas simplified responses require a lower level of education. Although Gemini-2.0 standard responses were more readable than ChatGPT-4 standard responses according to ARI, GFOG and FRF scores (p = 0.014, p = 0.040, and p = 0.001), Gemini-2.0 simplified responses were more readable than ChatGPT-4 simplified responses solely according to FRF scores (p = 0.016). Conclusions: This study shows that ChatGPT-4 and Gemini-2.0 can provide patients with an avenue to access comprehensive and accurate information about, tailored RP to their educational level.