Patient perspective on large-language model responses to questions about Moyamoya

By
Marcella R. Ruppert-Gomez
Joon Hyeok Choi
Steven J. Staffa
Katherine Holste
Jordan Xu
Catherine Stratton
Sophia D. Kocher
Edward R. Smith
Alfred Pokmeng See
February 26, 2026
0 min

Acta Neurochirurgica

Overview

This study evaluated ChatGPT-4o and Gemini 1.5 Flash large language models (LLMs) for their accuracy, safety, and helpfulness in answering common patient questions about moyamoya disease. While patients rated LLM responses comparably to physician answers, clinicians identified significant omissions related to risks, urgent symptoms, and recent research.

Background

Moyamoya disease is a rare cerebrovascular disorder requiring specialized knowledge for diagnosis and management. Patients often seek accessible information online, including from artificial intelligence chatbots. Large language models have become widely used tools for health information, but their reliability and safety in complex neurological conditions remain uncertain. Evaluating LLM responses against clinical standards is critical to understand their role in patient education and care.

Data Highlights

Metric	ChatGPT	Gemini	p-value
Number of response sets evaluated by community	27	20
Responses reported as "short" (%)	1.2%	20.8%	<0.001
Failure to address risks of procedures/medications (%)	38%	28.6%
Omission of when to consult medical professional (%)	27.2%	40.8%
Community rating responses as similar or better than physician (%)	72.2% (47.8% similar + 24.4% better)	71.4% (49% similar + 22.4% better)
Clinician rating of failure to address recent advances (%)	57.5%	62.5%
Clinician rating of failure to address urgent symptoms (%)	70.0%	70.0%

Key Findings

ChatGPT responses were significantly less likely to be "short" compared to Gemini (1.2% vs 20.8%, p < 0.001).
Both LLMs frequently failed to discuss potential risks associated with procedures and medications they mentioned (ChatGPT 38%, Gemini 28.6%).
Omission of guidance on when self-care is insufficient and medical consultation is needed was common (ChatGPT 27.2%, Gemini 40.8%).
Community respondents rated LLM answers as similar or somewhat better than physician-provided information in over 70% of cases.
Clinicians noted that LLM responses often lacked coverage of recent research advances (ChatGPT 57.5%, Gemini 62.5%) and failed to highlight urgent symptoms requiring referral (both 70%).
Overall, LLMs provide accessible information but have important safety and completeness limitations.

Clinical Implications

Clinicians should be aware that while LLMs may be perceived by patients as helpful and comparable to physician advice, these models currently omit critical safety information and fail to emphasize urgent clinical signs. Healthcare providers should guide patients to use LLM-generated information cautiously and reinforce the importance of professional evaluation for symptom changes or treatment decisions. Continued refinement of LLMs is needed to improve accuracy and safety in complex neurological diseases like moyamoya.

Conclusion

Large language models offer accessible moyamoya disease information that patients find comparable to physician responses; however, significant gaps in safety, risk communication, and up-to-date clinical guidance remain. Careful integration with clinical oversight is essential to optimize patient education and safety.

Patient perspective on large-language model responses to questions about Moyamoya

Clinical Report: Patient and Clinician Perspectives on LLM Responses for Moyamoya Disease

Overview

Background

Data Highlights

Key Findings

Clinical Implications

Conclusion

References

Original Source(s)

Patient perspective on large-language model responses to questions about Moyamoya

Related Content

Case Report: Beyond two years: a neurosurgical review of prolonged survival and late recurrence in IDH-wildtype GBM

Efficacy of Perampanel as a Monotherapy for Seizure Control During Awake Craniotomy in Glioma Patients

The onco-functional reorganization of language network underlying metaplasticity induced by gliomas