Leveraging simulation to provide a practical framework for estimating the novel scope of risk of large language models in healthcare

By
Mark Kalinich
James Luccarelli
John Santa Maria, Jr
Frank Moss
John Torous
June 24, 2026
0 min

Bmj Mental Health

Objective:

To demonstrate how simulation can extend existing medical-device risk management frameworks for addressing LLM-SaMD-specific risks.

Approach:

Simulation-based methodology: Implemented to estimate LLM-SaMD risk by evaluating fourteen open-source models on three safety-classification tasks: suicidal-ideation, therapy-request, and therapy-like interaction detection.

Key Findings:

LLM performance varied by task, with strong results for neutral content but frequent errors in suicidal-ideation and therapy-like interactions.
Model size generally correlated with improved performance.
Estimated P1 values ranged from 1.1×10⁻⁸ to 1.6×10⁻⁴ and P2 from 4.9×10⁻⁵ to 5.1×10⁻³.

Interpretation:

Simulation can link model failure modes to structured pathways to harm, extending existing medical-device risk frameworks to address the risks of LLM-SaMDs.

Limitations:

The study relies on synthetic datasets and may not fully capture real-world complexities.
Results may vary based on the specific context and population in which LLM-SaMDs are deployed.

Conclusion:

Simulation-based risk estimation offers a practical way to characterize the risk landscape for specific LLM-SaMD applications.

Leveraging simulation to provide a practical framework for estimating the novel scope of risk of large language models in healthcare

Objective:

Approach:

Key Findings:

Interpretation:

Limitations:

Conclusion:

Original Source(s)

Leveraging simulation to provide a practical framework for estimating the novel scope of risk of large language models in healthcare

Related Content

Psychiatry and the unknown future: the period of hope

The relationship between depression symptoms and cortisol levels in adolescents: the role of somatic symptoms and cognitive function

Psychological decoupling in responses to AI: emotional reactivity and behavioral discourse in digital environments