To evaluate the real-world performance of five Epic predictive AI tools and assess their accuracy in clinical settings.
Approach:
Key Findings:
Pooled AUROC estimates were consistently lower than Epic's reported benchmarks for all five models.
Significant discrepancies were found for the Sepsis Model (0.77 to 0.62), End-of-Life Care Index (0.89 to 0.76), Patient No-Show Model (0.77 to 0.62), Unplanned Readmission Model (0.74 to 0.70), and Deterioration Index (0.80 to 0.79).
High heterogeneity was observed across models, indicating performance variability across healthcare settings.
Model performance degradation may be attributed to data leakage and model drift.
Interpretation:
The findings raise concerns about the accuracy and reliability of AI predictive models in clinical practice, particularly regarding their real-world performance compared to vendor claims.
Limitations:
The study did not include external validation of the updated Sepsis Model version 2.
Potential underestimation of model performance due to data leakage not accounted for in the analysis.
Conclusion:
The study highlights the need for careful evaluation of AI predictive models in clinical settings to ensure their effectiveness and reliability.
Longer initial prescriptions, use of multiple benzodiazepines, and long-acting agents were associated with delayed discontinuation in a retrospective population-based cohort study.