Enhancing Breast Cancer Surgery Outcome Prediction Using SHAP-Interpreted XGBoost Model
Overview
A machine learning model using perioperative data was developed to predict adverse postoperative outcomes in breast cancer surgery patients. The XGBoost algorithm demonstrated superior predictive performance, with SHAP analysis identifying systemic immune-inflammation index, prognostic nutritional index, and age as key predictive factors.
Background
Breast cancer remains a leading malignancy among women, with surgery as the primary treatment modality. Despite standardized surgical and adjuvant therapies, patients face significant risks of recurrence and metastasis, which adversely affect prognosis. Traditional statistical methods have limited sensitivity and specificity in predicting individual postoperative outcomes. Machine learning offers enhanced predictive capabilities but is often hindered by interpretability challenges. The SHAP method provides a solution by elucidating feature contributions, enabling clinical interpretability of complex models.
Data Highlights
Model
Internal Validation AUC
External Validation AUC
External Validation Specificity
External Validation F1 Score
XGBoost
0.840
0.780
0.881
0.514
Random Forest
Not specified
Lower than XGBoost
Lower than 0.881
Lower than 0.514
Gradient Boosting Machine
Not specified
Lower than XGBoost
Lower than 0.881
Lower than 0.514
Support Vector Machine
Not specified
Lower than XGBoost
Lower than 0.881
Lower than 0.514
Logistic Regression
Not specified
Lower than XGBoost
Lower than 0.881
Lower than 0.514
Key Findings
The XGBoost model achieved the highest predictive accuracy for adverse postoperative outcomes with an internal validation AUC of 0.840 and external validation AUC of 0.780.
In the external validation cohort, XGBoost demonstrated superior specificity (0.881) and F1 score (0.514) compared to other machine learning models.
Calibration curves showed good agreement between predicted probabilities and actual adverse event rates for the XGBoost model.
Decision curve analysis confirmed that the XGBoost model provided the greatest clinical net benefit across most risk thresholds.
SHAP interpretability analysis identified systemic immune-inflammation index (SII), prognostic nutritional index (PNI), and patient age as the top three contributors to the model's predictions.
The study utilized readily available perioperative clinical data, enhancing the model's applicability in routine clinical practice.
Clinical Implications
The XGBoost model can be integrated into clinical workflows to identify breast cancer surgery patients at high risk for adverse postoperative outcomes, enabling tailored surveillance and intervention strategies. The identification of SII and PNI as key predictive factors highlights the importance of systemic inflammation and nutritional status assessment preoperatively. This approach supports personalized medicine by combining robust prediction with transparent interpretability.
Conclusion
The perioperative data-based XGBoost model, enhanced by SHAP analysis, effectively predicts adverse postoperative outcomes in breast cancer surgery patients and outperforms traditional and other machine learning models. Its interpretability facilitates clinical adoption and targeted patient management.
References
Study Authors/Source/2024 -- Utilizing SHAP Analysis to Enhance Machine Learning Models for Predicting Negative Outcomes in Breast Cancer Surgical Procedures