Cracks in the AI Crystal Ball: Why Clinical Prediction Tools Fall Short in the Real World - Summary - MDSpire

Cracks in the AI Crystal Ball: Why Clinical Prediction Tools Fall Short in the Real World

  • By

  • David Gamble

  • Andrew Wong

  • Amiran Baduashvili

  • June 22, 2026

  • 0 min

Share

Objective:

To evaluate the real-world performance of five Epic predictive AI tools and assess their accuracy in clinical settings.

Approach:
    Key Findings:
    • Pooled AUROC estimates were consistently lower than Epic's reported benchmarks for all five models.
    • Significant discrepancies were found for the Sepsis Model (0.77 to 0.62), End-of-Life Care Index (0.89 to 0.76), Patient No-Show Model (0.77 to 0.62), Unplanned Readmission Model (0.74 to 0.70), and Deterioration Index (0.80 to 0.79).
    • High heterogeneity was observed across models, indicating performance variability across healthcare settings.
    • Model performance degradation may be attributed to data leakage and model drift.
    Interpretation:

    The findings raise concerns about the accuracy and reliability of AI predictive models in clinical practice, particularly regarding their real-world performance compared to vendor claims.

    Limitations:
    • The study did not include external validation of the updated Sepsis Model version 2.
    • Potential underestimation of model performance due to data leakage not accounted for in the analysis.
    Conclusion:

    The study highlights the need for careful evaluation of AI predictive models in clinical settings to ensure their effectiveness and reliability.

    Sources:

Original Source(s)

Related Content