Leveraging simulation to provide a practical framework for estimating the novel scope of risk of large language models in healthcare - Report - MDSpire

Leveraging simulation to provide a practical framework for estimating the novel scope of risk of large language models in healthcare

  • By

  • Mark Kalinich

  • James Luccarelli

  • John Santa Maria, Jr

  • Frank Moss

  • John Torous

  • June 24, 2026

  • 0 min

Share

Clinical Report: Utilizing Simulation to Assess Risks of Large Language Models

Overview

This study demonstrates a simulation-based methodology for assessing risks associated with large language models as software medical devices (LLM-SaMD). It highlights the variability in model performance across different safety tasks.

Background

Large language models (LLMs) are increasingly integrated into healthcare, but their probabilistic outputs can lead to significant patient safety concerns. Existing medical device risk management frameworks may not fully address the unique risks posed by LLMs.

Data Highlights

TaskP1 RangeP2 Range
Suicidal Ideation1.1×10⁻⁸ to 1.6×10⁻⁴4.9×10⁻⁵ to 5.1×10⁻³
Therapy RequestVariedVaried
Therapy-like Interaction DetectionVariedVaried

Key Findings

  • Fourteen open-source LLMs were evaluated on three safety-classification tasks.
  • Model performance improved with size, particularly in generating neutral and non-therapeutic content.
  • Frequent errors were noted in detecting suicidal ideation and therapy-like interactions.
  • Estimated probabilities (P1 and P2) for hazards varied significantly across tasks.
  • Simulation can link model failure modes to pathways of harm, aiding in risk assessment.

Clinical Implications

Simulation-based risk estimation offers a method for evaluating the safety of LLM-SaMD in various clinical contexts.

Conclusion

Simulation can help address the challenges posed by LLMs in healthcare.

Related Resources & Content

  1. FDA, FDA, 2026 -- Predetermined Change Control Plans for Machine Learning-Enabled Medical Devices: Guiding Principles
  2. Nature Medicine, 2026 -- An LLM chatbot to facilitate primary-to-specialist care transitions: a randomized controlled trial
  3. JMIR, 2026 -- Human-in-the-Loop as a Safety Guardrail: Clinical Accountability in the Large Language Model Era
  4. npj Digital Medicine, 2025 -- The evaluation illusion of large language models in medicine
  5. JMIR, 2026 -- Ethical Governance of Large Language Models in Health Care: Trust, Responsibility, and Equity in Routine Use
  6. npj Digital Medicine — Collaboration Between Humans and Large Language Models in Clinical Practice: A Systematic Review and Meta-Analysis
  7. Predetermined Change Control Plans for Machine Learning-Enabled Medical Devices: Guiding Principles | FDA
  8. An LLM chatbot to facilitate primary-to-specialist care transitions: a randomized controlled trial | Nature Medicine
  9. Human–large language model collaboration in clinical medicine: a systematic review and meta-analysis | npj Digital Medicine

Original Source(s)

Related Content