Performance of large language models in delivering accurate and comprehensible patient information on heart failure and cardiomyopathy

By
Christoph Reich
Jule Leverenz
Charlotte Brand
Lasse Niemeier
Isabel Branzei
Mustafa Yildirim
Farbod Sedaghat-Hamedani
Ali Amr
Norbert Frey
Benjamin Meder
June 9, 2026
0 min

Frontiers In Digital Health

Overview

This study evaluates the accuracy and comprehensibility of large language models (LLMs) in providing patient information on heart failure and cardiomyopathy. Six prominent LLMs were assessed on 50 expert-curated questions, revealing variability in readability and comprehensiveness, with Gemini performing the best overall.

Background

Heart failure is a complex chronic condition affecting millions, necessitating effective patient education for self-management. The shift towards digital health has made online information a primary resource for patients, but the prevalence of misinformation poses risks. Evaluating LLMs for their ability to provide accurate and understandable health information is crucial for enhancing patient education.

Data Highlights

Model	Readability (Flesch-Kincaid Grade)	Composite Mean Rating	Preferred Model Selection (%)
Gemini	11.3 ± 1.9	4.41 ± 0.77	43.7
Grok	N/A	4.23 ± 0.76	30.3

Key Findings

Gemini provided the most readable responses but was also the most verbose.
Across 2,700 ratings, Gemini received the highest composite mean rating for completeness and factual reliability.
Confabulation avoidance scored consistently high across all models.
Conciseness scored the lowest among the evaluated domains.
Auto-graders rated the responses higher than medical students and experts.

Clinical Implications

The findings suggest that while LLMs can provide accurate information, variability in readability and comprehensibility may affect patient understanding. Healthcare professionals should be aware of these differences when recommending digital health resources to patients.

Conclusion

The evaluation of LLMs highlights their potential in delivering patient information on complex conditions like heart failure, though attention to readability and conciseness is necessary for optimal patient comprehension.

Performance of large language models in delivering accurate and comprehensible patient information on heart failure and cardiomyopathy

Clinical Report: Evaluation of Large Language Models in Patient Information on Heart Failure

Overview

Background

Data Highlights

Key Findings

Clinical Implications

Conclusion

Related Resources & Content

Original Source(s)

Performance of large language models in delivering accurate and comprehensible patient information on heart failure and cardiomyopathy

Related Content

Exercise modalities and dose-response for LVEF Improvement in heart failure patients: a systematic review and network meta-analysis

Correction: Multicriteria Optimization of Language Models for Heart Failure With Preserved Ejection Fraction Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study

Residual risk in cardiovascular and renal diseases and the potential role of aldosterone synthase inhibitors