Large language models provide unsafe answers to patient-posed medical questions - Takeaways - MDSpire

Large language models provide unsafe answers to patient-posed medical questions

  • By

  • Rachel L. Draelos

  • Samina Afreen

  • Barbara Blasko

  • Tiffany L. Brazile

  • Natasha Chase

  • Dimple Patel Desai

  • Jessica Evert

  • Heather L. Gardner

  • Lauren Herrmann

  • Aswathy Vaikom House

  • Stephanie Kass

  • Marianne Kavan

  • Kirshma Khemani

  • Amanda Koire

  • Lauren M. McDonald

  • Zahraa Rabeeah

  • Amy Shah

  • February 13, 2026

  • 0 min

Share

  • 1

    The study evaluates the safety of four large language model chatbots in providing medical advice to patients.

  • 2

    A total of 888 responses to 222 medical inquiries were analyzed, revealing significant differences in safety among the chatbots.

  • 3

    The rate of problematic responses ranged from 21.6% for Claude to 43.2% for Llama, indicating varying levels of safety.

  • 4

    Unsafe responses were found in 5% of Claude's answers, while GPT-4o and Llama had unsafe rates of 13%.

  • 5

    The findings suggest that many patients may receive unsafe medical advice from these chatbots, necessitating further safety improvements.

Original Source(s)

Related Content