Enhancing clinically cardiovascular machine learning model for risk prediction via sample augmentation - Summary - MDSpire

Enhancing clinically cardiovascular machine learning model for risk prediction via sample augmentation

  • By

  • Xiaoyu Tang

  • Min Tang

  • Wu Liu

  • Shaoyang Cui

  • June 9, 2026

  • 0 min

Share

Objective:

To evaluate the value of moderate data augmentation for cardiovascular risk modeling and propose an interpretable and deployable solution within a clear framework of transitioning from continuous risk assessment to thresholding.

Key Findings:
  • 2 × augmentation achieved a favorable compromise between error reduction (lower MAE and RMSE) and goodness of fit (higher R2).
  • The Random Forest (RF) model achieved an accuracy of 94.0%, F2 of 94.4%, sensitivity of 95.9%, and specificity of 91.8% after thresholding.
  • Key driving factors identified include oldpeak, num major vessels, chest pain type, thal, exang, and max hr.
Interpretation:

Moderate data augmentation (preferably 2×) can significantly enhance robustness in small sample settings; RF strikes a favorable balance between accuracy, stability, and interpretability.

Limitations:
  • The study is limited to a specific heart disease classification dataset, which may not represent other conditions.
  • Results may not be generalizable to other datasets or clinical scenarios, limiting broader applicability.
Conclusion:

This study offers a reusable multiplication guidance and risk stratification scheme, providing a methodological foundation for deploying interpretable cardiovascular risk models effectively.

Original Source(s)

Related Content