Ensemble Machine Learning Models for Predicting Patients With High Usage: Model Validation and Economic Impact Analysis - Report - MDSpire

Ensemble Machine Learning Models for Predicting Patients With High Usage: Model Validation and Economic Impact Analysis

  • By

  • Joshua Kuan Tan

  • February 20, 2026

  • 0 min

Share

Predictive Modeling of High Utilization Patients Using Ensemble Machine Learning

Overview

This study developed and validated multiclass ensemble machine learning models to predict inpatient length of stay and emergency department visits among patients with diabetes. The boosted tree ensemble models demonstrated the highest predictive performance and showed potential for significant cost savings in real-world healthcare settings.

Background

Healthcare expenditure is increasingly concentrated among a small subset of high-need, high-cost patients, particularly those with chronic diseases such as diabetes. Traditional predictive models often focus narrowly on the highest users, which may overlook patients with varying future healthcare needs. Machine learning approaches, especially ensemble models, offer promise in improving prediction accuracy across multiple levels of healthcare utilization, supporting targeted interventions and resource planning.

Data Highlights

MetricLength of Stay (LOS)Emergency Department (ED) Visits
Multiclass AUROC0.6877 (95% CI 0.6927-0.7255)0.7601 (95% CI 0.7301-0.7654)
Accuracy0.6522 (95% CI 0.6465-0.6579)0.7457 (95% CI 0.7405-0.7508)
Correct Class Assignment30.3%39.8%
Identification of Future Users77.0%73.9%
Simulated Cost Reduction (SGD)152 million (Boosted tree with logistic regression base learner)Not specified

Key Findings

  • Boosted tree ensemble models outperformed random forest and linear support vector machines in predicting multilevel inpatient LOS and ED visits.
  • Models achieved multiclass AUROC scores of 0.6877 for LOS and 0.7601 for ED visits, indicating good discrimination.
  • Accuracy for predicting correct utilization classes was 65.2% for LOS and 74.6% for ED visits.
  • The models identified 77% of future inpatient users and 73.9% of future ED users, supporting effective targeting.
  • Economic impact analysis showed a potential cost reduction of SGD $152 million (US $111 million) using the boosted tree model with logistic regression base learner.
  • These predictive models can inform population health programs and budgeting for diabetes-related care.

Clinical Implications

Ensemble machine learning models can enhance identification of patients at varying risk levels for high healthcare utilization, enabling more precise targeting of interventions. Incorporating these models into diabetes population health management may improve resource allocation and reduce costs. Clinicians and healthcare administrators should consider integrating such predictive tools to support proactive care planning.

Conclusion

Multiclass ensemble models, particularly boosted tree algorithms, effectively predict healthcare utilization levels among patients with diabetes and hold promise for generating meaningful cost savings. Their application can facilitate targeted interventions and optimize healthcare resource management.

References

  1. Tan JK et al. 2026 -- Predictive Modeling of High Utilization Patients Using Ensemble Machine Learning: Validation and Economic Impact Assessment

Original Source(s)

Related Content