Performance of large language models in delivering accurate and comprehensible patient information on heart failure and cardiomyopathy - Report - MDSpire
Advertisement
Performance of large language models in delivering accurate and comprehensible patient information on heart failure and cardiomyopathy
Clinical Report: Evaluation of Large Language Models in Patient Information on Heart Failure
Overview
This study evaluates the accuracy and comprehensibility of large language models (LLMs) in providing patient information on heart failure and cardiomyopathy. Six prominent LLMs were assessed on 50 expert-curated questions, revealing variability in readability and comprehensiveness, with Gemini performing the best overall.
Background
Heart failure is a complex chronic condition affecting millions, necessitating effective patient education for self-management. The shift towards digital health has made online information a primary resource for patients, but the prevalence of misinformation poses risks. Evaluating LLMs for their ability to provide accurate and understandable health information is crucial for enhancing patient education.
Data Highlights
Model
Readability (Flesch-Kincaid Grade)
Composite Mean Rating
Preferred Model Selection (%)
Gemini
11.3 ± 1.9
4.41 ± 0.77
43.7
Grok
N/A
4.23 ± 0.76
30.3
Key Findings
Gemini provided the most readable responses but was also the most verbose.
Across 2,700 ratings, Gemini received the highest composite mean rating for completeness and factual reliability.
Confabulation avoidance scored consistently high across all models.
Conciseness scored the lowest among the evaluated domains.
Auto-graders rated the responses higher than medical students and experts.
Clinical Implications
The findings suggest that while LLMs can provide accurate information, variability in readability and comprehensibility may affect patient understanding. Healthcare professionals should be aware of these differences when recommending digital health resources to patients.
Conclusion
The evaluation of LLMs highlights their potential in delivering patient information on complex conditions like heart failure, though attention to readability and conciseness is necessary for optimal patient comprehension.
by Christoph Reich, Jule Leverenz, Charlotte Brand, Lasse Niemeier, Isabel Branzei, Mustafa Yildirim, Farbod Sedaghat-Hamedani, Ali Amr, Norbert Frey, Benjamin Meder