An XGBoost-based model for detecting undiagnosed type 2 diabetes using routine physical and lifestyle data from a multi-center Chinese population - Summary - MDSpire

An XGBoost-based model for detecting undiagnosed type 2 diabetes using routine physical and lifestyle data from a multi-center Chinese population

  • By

  • Hui Xiao

  • Qian Xi

  • Ping Zeng

  • Jinjuan Hao

  • Qinghua He

  • Xiaoxia Wang

  • Chi Zhang

  • June 24, 2026

  • 0 min

Share

Objective:

To develop and validate an interpretable machine learning model to identify individuals with undiagnosed type 2 diabetes (T2D) using data from routine health checkups.

Approach:
  • Study Design: Retrospective, multi-center study analyzing data from 12 tertiary hospitals in China.
  • Model Development: Data from 11,382 individuals was used to develop an XGBoost model optimized with 5-fold cross-validation.
  • Validation: An independent test set of 1,026 individuals was used for internal validation, with model performance assessed using the area under the receiver operating characteristic curve (AUC).
Key Findings:
  • The final model included 12 predictors, with fasting blood glucose being the most influential (50.6%), followed by creatinine (6.6%), triglyceride (5.6%), age (5.1%), and low-density lipoprotein (5.0%).
  • The model achieved an AUC of 77.2% (95%CI: 70.3%–84.1%) on the independent test set.
Interpretation:

Limitations:
  • The study design is retrospective and identifies concurrent disease status rather than predicting future onset.
  • The excluded population had a greater comorbidity burden, which may affect generalizability.
Conclusion:

The model can assist in identifying high-risk individuals during standard health examinations.

Original Source(s)

Related Content