Fair positive unlabeled learning for predicting undiagnosed Alzheimer’s disease in diverse electronic health records

Category	Detail
Condition	Alzheimer’s Disease (AD), a common neurodegenerative disease with underdiagnosis issues
Key Mechanisms	Semi-supervised positive unlabeled learning (SSPUL) with racial bias mitigation applied to electronic health records (EHR) data
Target Population	Diverse racial and ethnic groups including non-Hispanic white, non-Hispanic African American, Hispanic Latino, and East Asian patients
Care Setting	Healthcare systems utilizing electronic health records, exemplified by UCLA Health

SSPUL achieved higher sensitivity (0.77–0.81) and AUCPR (0.81–0.87) than supervised models across diverse racial groups
SSPUL demonstrated superior fairness with the lowest cumulative parity loss, addressing racial bias in AD diagnosis
Validation with polygenic risk scores confirmed higher genetic risk in labeled and predicted positive AD cases across multiple ethnic groups

Incorporate semi-supervised positive unlabeled learning models to improve detection of undiagnosed AD in diverse populations
Utilize comprehensive EHR data including neurological and non-neurological features for prediction
Address racial and ethnic disparities by applying bias mitigation techniques in diagnostic algorithms

Early identification of AD allows for timely lifestyle interventions and treatment planning
Consider integrating machine learning predictions with clinical assessments to enhance diagnosis accuracy

Regularly evaluate model performance across racial and ethnic groups to ensure equitable diagnostic accuracy
Monitor for label bias and update models with new data to maintain sensitivity and fairness

Underdiagnosis of AD is prevalent in underrepresented populations due to systemic biases and limited label availability
Reliance on supervised models without bias mitigation may perpetuate diagnostic disparities
Cultural stigma and lower awareness in some groups may delay diagnosis and treatment

Patients aged 65 and older from diverse racial and ethnic backgrounds with potential undiagnosed AD

Early and equitable identification through SSPUL can facilitate timely interventions and reduce health disparities

Leverage semi-supervised learning approaches to utilize both labeled and unlabeled EHR data for AD prediction
Implement bias mitigation strategies to ensure fairness across racial and ethnic groups
Incorporate a wide range of clinical features beyond expert-selected variables to improve model robustness
Validate predictive models with genetic risk markers to confirm biological relevance
Continuously assess and update diagnostic tools to address evolving healthcare disparities

Clinical Scorecard: Equitable Prediction of Undiagnosed Alzheimer’s Disease Using Fair Positive Unlabeled Learning in Diverse Electronic Health Records