Clinical Scorecard: Machine Learning Algorithm for Identifying High-Risk Individuals for Hepatitis C Infection
At a Glance
Category
Detail
Condition
Hepatitis C virus (HCV) infection
Key Mechanisms
Asymptomatic infection leading to undiagnosed cases; transmission linked to opioid epidemic and injection drug use; machine learning models using EHR data to predict infection risk
Target Population
Adults aged 18–79 years tested for HCV antibodies, RNA, or genotype
Care Setting
Clinical settings utilizing electronic health records for targeted screening
Key Highlights
HCV infection rates have increased amid the opioid epidemic, with one-third of infected individuals unaware due to asymptomatic nature.
A gradient boosting machine (GBM) model using 275 sociodemographic and clinical features achieved high predictive accuracy (C statistic 0.916) for HCV infection.
The ML algorithm identified 75.63% of HCV cases in the highest risk decile, enabling efficient targeted screening with one positive case per six tests.
Guideline-Based Recommendations
Diagnosis
Use machine learning algorithms applied to EHR data to identify individuals at high risk for HCV infection.
Screen adults aged 18 years and older for HCV antibodies, RNA, or genotype per CDC recommendations.
Management
Prioritize targeted screening for individuals stratified as high risk by ML models to optimize resource use.
Implement direct-acting antiviral therapies for confirmed HCV cases to achieve sustained virologic response.
Monitoring & Follow-up
Monitor risk stratification performance and update ML models with new data to maintain predictive accuracy.
Follow up on patients identified as high risk to ensure timely diagnostic testing and treatment initiation.
Risks
Undiagnosed HCV infection contributes to ongoing transmission and liver-related morbidity and mortality.
Universal screening may lead to unnecessary testing and resource burden without targeted approaches.
Patient & Prescribing Data
Adults tested for HCV infection in diverse clinical settings across Florida
ML-based risk stratification can enhance identification of patients needing confirmatory testing and antiviral treatment, improving care efficiency and outcomes.
Clinical Best Practices
Incorporate ML algorithms into clinical workflows to identify high-risk patients without disrupting care.
Use a 6-month window of sociodemographic and clinical data prior to testing to inform risk prediction.
Validate and compare multiple ML models to select the most accurate tool for HCV risk prediction.
Address clinician barriers to discussing sensitive risk factors by leveraging data-driven screening tools.
by Suk-Chan Jang, Wei-Hsuan Lo-Ciganic, Pilar Hernandez-Con, Chanakan Jenjai, James Huang, Ashley Stultz, Shunhua Yan, Debbie L Wilson, Ashley Norse, Faheem W Guirgis, Robert L Cook, Christine Gage, Khoa A Nguyen, Patrick Hornes, Yonghui Wu, David R Nelson, Haesuk Park