To evaluate the value of moderate data augmentation for cardiovascular risk modeling and propose an interpretable and deployable solution within a clear framework of transitioning from continuous risk assessment to thresholding.
Key Findings:
2 × augmentation achieved a favorable compromise between error reduction (lower MAE and RMSE) and goodness of fit (higher R2).
The Random Forest (RF) model achieved an accuracy of 94.0%, F2 of 94.4%, sensitivity of 95.9%, and specificity of 91.8% after thresholding.
Key driving factors identified include oldpeak, num major vessels, chest pain type, thal, exang, and max hr.
Interpretation:
Moderate data augmentation (preferably 2×) can significantly enhance robustness in small sample settings; RF strikes a favorable balance between accuracy, stability, and interpretability.
Limitations:
The study is limited to a specific heart disease classification dataset, which may not represent other conditions.
Results may not be generalizable to other datasets or clinical scenarios, limiting broader applicability.
Conclusion:
This study offers a reusable multiplication guidance and risk stratification scheme, providing a methodological foundation for deploying interpretable cardiovascular risk models effectively.