An XGBoost-based model for detecting undiagnosed type 2 diabetes using routine physical and lifestyle data from a multi-center Chinese population - Summary - MDSpire
Advertisement
An XGBoost-based model for detecting undiagnosed type 2 diabetes using routine physical and lifestyle data from a multi-center Chinese population
To develop and validate an interpretable machine learning model to identify individuals with undiagnosed type 2 diabetes (T2D) using data from routine health checkups.
Approach:
Study Design: Retrospective, multi-center study analyzing data from 12 tertiary hospitals in China.
Model Development: Data from 11,382 individuals was used to develop an XGBoost model optimized with 5-fold cross-validation.
Validation: An independent test set of 1,026 individuals was used for internal validation, with model performance assessed using the area under the receiver operating characteristic curve (AUC).
Key Findings:
The final model included 12 predictors, with fasting blood glucose being the most influential (50.6%), followed by creatinine (6.6%), triglyceride (5.6%), age (5.1%), and low-density lipoprotein (5.0%).
The model achieved an AUC of 77.2% (95%CI: 70.3%–84.1%) on the independent test set.
Interpretation:
Limitations:
The study design is retrospective and identifies concurrent disease status rather than predicting future onset.
The excluded population had a greater comorbidity burden, which may affect generalizability.
Conclusion:
The model can assist in identifying high-risk individuals during standard health examinations.