Clinical Report: Patient and Clinician Perspectives on LLM Responses for Moyamoya Disease
Overview
This study evaluated ChatGPT-4o and Gemini 1.5 Flash large language models (LLMs) for their accuracy, safety, and helpfulness in answering common patient questions about moyamoya disease. While patients rated LLM responses comparably to physician answers, clinicians identified significant omissions related to risks, urgent symptoms, and recent research.
Background
Moyamoya disease is a rare cerebrovascular disorder requiring specialized knowledge for diagnosis and management. Patients often seek accessible information online, including from artificial intelligence chatbots. Large language models have become widely used tools for health information, but their reliability and safety in complex neurological conditions remain uncertain. Evaluating LLM responses against clinical standards is critical to understand their role in patient education and care.
Data Highlights
Metric
ChatGPT
Gemini
p-value
Number of response sets evaluated by community
27
20
Responses reported as "short" (%)
1.2%
20.8%
<0.001
Failure to address risks of procedures/medications (%)
38%
28.6%
Omission of when to consult medical professional (%)
27.2%
40.8%
Community rating responses as similar or better than physician (%)
72.2% (47.8% similar + 24.4% better)
71.4% (49% similar + 22.4% better)
Clinician rating of failure to address recent advances (%)
57.5%
62.5%
Clinician rating of failure to address urgent symptoms (%)
70.0%
70.0%
Key Findings
ChatGPT responses were significantly less likely to be "short" compared to Gemini (1.2% vs 20.8%, p < 0.001).
Both LLMs frequently failed to discuss potential risks associated with procedures and medications they mentioned (ChatGPT 38%, Gemini 28.6%).
Omission of guidance on when self-care is insufficient and medical consultation is needed was common (ChatGPT 27.2%, Gemini 40.8%).
Community respondents rated LLM answers as similar or somewhat better than physician-provided information in over 70% of cases.
Clinicians noted that LLM responses often lacked coverage of recent research advances (ChatGPT 57.5%, Gemini 62.5%) and failed to highlight urgent symptoms requiring referral (both 70%).
Overall, LLMs provide accessible information but have important safety and completeness limitations.
Clinical Implications
Clinicians should be aware that while LLMs may be perceived by patients as helpful and comparable to physician advice, these models currently omit critical safety information and fail to emphasize urgent clinical signs. Healthcare providers should guide patients to use LLM-generated information cautiously and reinforce the importance of professional evaluation for symptom changes or treatment decisions. Continued refinement of LLMs is needed to improve accuracy and safety in complex neurological diseases like moyamoya.
Conclusion
Large language models offer accessible moyamoya disease information that patients find comparable to physician responses; however, significant gaps in safety, risk communication, and up-to-date clinical guidance remain. Careful integration with clinical oversight is essential to optimize patient education and safety.
by Marcella R. Ruppert-Gomez, Joon Hyeok Choi, Steven J. Staffa, Katherine Holste, Jordan Xu, Catherine Stratton, Sophia D. Kocher, Edward R. Smith, Alfred Pokmeng See