Multidisciplinary prediction of running-related injuries using machine learning

By
Han Wu
Katherine Brooke-Wavell
Michael R. Barnes
Zainab Awan
Sarabjit Mastana
Sam Allen
Richard C. Blagrove
February 6, 2026

Npj Digital Medicine

Overview

This study developed a machine learning (ML) framework using multidisciplinary risk factors to predict running-related injuries (RRIs) in competitive endurance runners. Random forest models achieved the best predictive performance (AUC ~0.78), demonstrating moderate improvement over previous approaches and highlighting the value of integrating diverse data types for individualized injury risk prediction.

Background

Running-related injuries (RRIs) are multifactorial and pose significant health and economic burdens for endurance athletes. Traditional prediction models have often focused on limited risk factors, lacking integration of genetic, biomechanical, nutritional, and training data. Machine learning offers a promising approach to handle complex, multidimensional data for personalized injury risk assessment. This study prospectively monitored 142 competitive runners over 12 months, collecting weekly data across multiple domains to develop and evaluate ML models for RRI prediction.

Data Highlights

Parameter	Value
Number of runners	142
Weekly samples collected	6181
Monitoring duration	12 months
Best model AUC (Random Forest)	0.781 ± 0.016 to 0.784 ± 0.014
Significance level for improved performance	q < 0.05

Key Findings

Integration of multidisciplinary risk factors including genetics, biomechanics, nutrition, and training data enabled improved RRI prediction.
Random forest models outperformed other ML algorithms with an AUC around 0.78, indicating moderate predictive accuracy.
Logistic regression showed significant performance gains when trained on a broader range of risk factors compared to high-quality subsets.
The study provides a reproducible ML framework and a valuable dataset for future large-scale injury prediction research.
Comparative analysis of ML methods revealed important interactions between data structure and model suitability for RRI prediction.

Clinical Implications

Clinicians and sports scientists can leverage integrative ML models incorporating diverse risk factors to better identify athletes at risk of RRIs. This approach supports personalized injury prevention strategies and targeted interventions. The reproducible framework and dataset facilitate ongoing refinement and validation of predictive tools in endurance running populations.

Conclusion

This study demonstrates that machine learning models integrating multidisciplinary risk factors can moderately improve the prediction of running-related injuries. The findings support the potential of data-driven, individualized injury risk assessment in competitive endurance runners.

References

Kakouris et al. 2021 -- A systematic review of running-related musculoskeletal injuries in runners
Hespanhol Junior et al. 2016 -- Health and economic burden of running-related injuries in runners training for an event
Winter et al. 2020 -- A multifactorial approach to overuse running injuries: a 1-year prospective study
Correia et al. 2024 -- Risk factors for running-related injuries: an umbrella systematic review
Leckey et al. 2025 -- Machine learning approaches to injury risk prediction in sport: a scoping review with evidence synthesis
Lövdal et al. 2021 -- Injury prediction in competitive runners with machine learning

Multidisciplinary prediction of running-related injuries using machine learning

Integrative Machine Learning for Predicting Running-Related Injuries

Overview

Background

Data Highlights

Key Findings

Clinical Implications

Conclusion

References

Original Source(s)

Multidisciplinary prediction of running-related injuries using machine learning

Related Content

APB Transfer Stabilizes Chronic Thumb RCL

Side Effects: The Details Hiding in Plain Sight

These Biomechanical Factors May Predict Achilles Tendon Injuries in Runners