Cracks in the AI Crystal Ball: Why Clinical Prediction Tools Fall Short in the Real World

By
David Gamble
Andrew Wong
Amiran Baduashvili
June 22, 2026
0 min

Journal Of General Internal Medicine

Objective:

To evaluate the real-world performance of five Epic predictive AI tools and assess their accuracy in clinical settings.

Approach:

Key Findings:

Pooled AUROC estimates were consistently lower than Epic's reported benchmarks for all five models.
Significant discrepancies were found for the Sepsis Model (0.77 to 0.62), End-of-Life Care Index (0.89 to 0.76), Patient No-Show Model (0.77 to 0.62), Unplanned Readmission Model (0.74 to 0.70), and Deterioration Index (0.80 to 0.79).
High heterogeneity was observed across models, indicating performance variability across healthcare settings.
Model performance degradation may be attributed to data leakage and model drift.

Interpretation:

The findings raise concerns about the accuracy and reliability of AI predictive models in clinical practice, particularly regarding their real-world performance compared to vendor claims.

Limitations:

The study did not include external validation of the updated Sepsis Model version 2.
Potential underestimation of model performance due to data leakage not accounted for in the analysis.

Conclusion:

The study highlights the need for careful evaluation of AI predictive models in clinical settings to ensure their effectiveness and reliability.

Sources:

Journal of General Internal Medicine

Cracks in the AI Crystal Ball: Why Clinical Prediction Tools Fall Short in the Real World

Objective:

Approach:

Key Findings:

Interpretation:

Limitations:

Conclusion:

Sources:

Original Source(s)

Cracks in the AI Crystal Ball: Why Clinical Prediction Tools Fall Short in the Real World

Related Content

Long-term outcome of patients with atrial fibrillation and heart failure with preserved ejection fraction after combined radiofrequency catheter ablation and left atrial appendage closure

Shorter Initial Benzo Courses Linked to Discontinuation

Immunoglobulin G N-glycosylation predicts outcome in sepsis caused by pathogenic Gram-negative bacteria and Gram-positive bacteria: a nested case-control study