To benchmark machine learning algorithms for heart failure risk stratification and establish a dual-XAI framework for clinical deployment, enhancing interpretability and reliability.
Key Findings:
Logistic Regression outperformed other algorithms with a ROC-AUC of 0.9451, indicating strong predictive capability.
Left Ventricular Ejection Fraction (LVEF) was identified as the most significant predictor, highlighting its clinical relevance.
The model demonstrated an 18.42% false-negative rate, which suggests a need for careful consideration in clinical settings to avoid missing high-risk patients.
Interpretation:
The study highlights that simpler, interpretable models can be more effective than complex ensembles in moderate-sized datasets, providing practical guidance for clinical settings while ensuring transparency.
Limitations:
The study requires prospective validation against independent clinical outcomes to confirm findings.
Potential biases from class imbalance and hyperparameter optimization were not fully addressed, which may affect the generalizability of the results.
Conclusion:
The validated dual-XAI framework shows promise for clinical risk stratification systems in heart failure, emphasizing the need for systematic benchmarking and interpretability in ML applications.