Large language models provide unsafe answers to patient-posed medical questions

By
Rachel L. Draelos
Samina Afreen
Barbara Blasko
Tiffany L. Brazile
Natasha Chase
Dimple Patel Desai
Jessica Evert
Heather L. Gardner
Lauren Herrmann
Aswathy Vaikom House
Stephanie Kass
Marianne Kavan
Kirshma Khemani
Amanda Koire
Lauren M. McDonald
Zahraa Rabeeah
Amy Shah
February 13, 2026

Npj Digital Medicine

Objective:

To assess the safety of four publicly available chatbots in providing medical advice to patients, focusing on their reliability and potential risks.

Key Findings:

Statistically significant differences in safety among chatbots, with implications for patient care.
Problematic response rates ranged from 21.6% (Claude) to 43.2% (Llama), indicating varying levels of reliability.
Unsafe responses varied from 5% (Claude) to 13% (GPT-4o, Llama), raising concerns about patient safety.
Qualitative analysis revealed responses that could potentially lead to serious patient harm, underscoring the need for caution.

Interpretation:

The findings indicate that millions of patients may be receiving unsafe medical advice from chatbots, highlighting an urgent need for improvements in clinical safety protocols.

Limitations:

The study only evaluated four specific chatbots, which may not represent the broader landscape of available models.
Responses were limited to primary care topics, which may not encompass all medical inquiries, potentially skewing the results.
The selection of chatbots may introduce bias, as the chosen models may not reflect the full range of capabilities and safety profiles.

Conclusion:

Further work is necessary to enhance the clinical safety of large language models used in medical advice.

Large language models provide unsafe answers to patient-posed medical questions

Objective:

Key Findings:

Interpretation:

Limitations:

Conclusion:

Original Source(s)

Large language models provide unsafe answers to patient-posed medical questions

Related Content

Global research landscape and emerging trends of tertiary lymphoid structures in autoimmune diseases: a bibliometric analysis

A hybrid implementation-effectiveness study of a school-based intervention for promoting health and well-being in low-resource settings: the ISOBAR study protocol

Anti-inflammatory CAR-microglia targeting Aβ for Alzheimer’s disease therapy