Integrative Machine Learning for Predicting Running-Related Injuries
Overview
This study developed a machine learning (ML) framework using multidisciplinary risk factors to predict running-related injuries (RRIs) in competitive endurance runners. Random forest models achieved the best predictive performance (AUC ~0.78), demonstrating moderate improvement over previous approaches and highlighting the value of integrating diverse data types for individualized injury risk prediction.
Background
Running-related injuries (RRIs) are multifactorial and pose significant health and economic burdens for endurance athletes. Traditional prediction models have often focused on limited risk factors, lacking integration of genetic, biomechanical, nutritional, and training data. Machine learning offers a promising approach to handle complex, multidimensional data for personalized injury risk assessment. This study prospectively monitored 142 competitive runners over 12 months, collecting weekly data across multiple domains to develop and evaluate ML models for RRI prediction.
Data Highlights
Parameter
Value
Number of runners
142
Weekly samples collected
6181
Monitoring duration
12 months
Best model AUC (Random Forest)
0.781 ± 0.016 to 0.784 ± 0.014
Significance level for improved performance
q < 0.05
Key Findings
Integration of multidisciplinary risk factors including genetics, biomechanics, nutrition, and training data enabled improved RRI prediction.
Random forest models outperformed other ML algorithms with an AUC around 0.78, indicating moderate predictive accuracy.
Logistic regression showed significant performance gains when trained on a broader range of risk factors compared to high-quality subsets.
The study provides a reproducible ML framework and a valuable dataset for future large-scale injury prediction research.
Comparative analysis of ML methods revealed important interactions between data structure and model suitability for RRI prediction.
Clinical Implications
Clinicians and sports scientists can leverage integrative ML models incorporating diverse risk factors to better identify athletes at risk of RRIs. This approach supports personalized injury prevention strategies and targeted interventions. The reproducible framework and dataset facilitate ongoing refinement and validation of predictive tools in endurance running populations.
Conclusion
This study demonstrates that machine learning models integrating multidisciplinary risk factors can moderately improve the prediction of running-related injuries. The findings support the potential of data-driven, individualized injury risk assessment in competitive endurance runners.
References
Kakouris et al. 2021 -- A systematic review of running-related musculoskeletal injuries in runners
Hespanhol Junior et al. 2016 -- Health and economic burden of running-related injuries in runners training for an event
Winter et al. 2020 -- A multifactorial approach to overuse running injuries: a 1-year prospective study
Correia et al. 2024 -- Risk factors for running-related injuries: an umbrella systematic review
Leckey et al. 2025 -- Machine learning approaches to injury risk prediction in sport: a scoping review with evidence synthesis
Lövdal et al. 2021 -- Injury prediction in competitive runners with machine learning
A VHA study across 11 vendors finds AI-generated primary care notes score lower than clinician-written notes, with the largest deficits in thoroughness, organization, and usefulness