Leveraging simulation to provide a practical framework for estimating the novel scope of risk of large language models in healthcare - Summary - MDSpire

Leveraging simulation to provide a practical framework for estimating the novel scope of risk of large language models in healthcare

  • By

  • Mark Kalinich

  • James Luccarelli

  • John Santa Maria, Jr

  • Frank Moss

  • John Torous

  • June 24, 2026

  • 0 min

Share

Objective:

To demonstrate how simulation can extend existing medical-device risk management frameworks for addressing LLM-SaMD-specific risks.

Approach:
  • Simulation-based methodology: Implemented to estimate LLM-SaMD risk by evaluating fourteen open-source models on three safety-classification tasks: suicidal-ideation, therapy-request, and therapy-like interaction detection.
Key Findings:
  • LLM performance varied by task, with strong results for neutral content but frequent errors in suicidal-ideation and therapy-like interactions.
  • Model size generally correlated with improved performance.
  • Estimated P1 values ranged from 1.1×10⁻⁸ to 1.6×10⁻⁴ and P2 from 4.9×10⁻⁵ to 5.1×10⁻³.
Interpretation:

Simulation can link model failure modes to structured pathways to harm, extending existing medical-device risk frameworks to address the risks of LLM-SaMDs.

Limitations:
  • The study relies on synthetic datasets and may not fully capture real-world complexities.
  • Results may vary based on the specific context and population in which LLM-SaMDs are deployed.
Conclusion:

Simulation-based risk estimation offers a practical way to characterize the risk landscape for specific LLM-SaMD applications.

Original Source(s)

Related Content