Fair positive unlabeled learning for predicting undiagnosed Alzheimer’s disease in diverse electronic health records - Report - MDSpire

Fair positive unlabeled learning for predicting undiagnosed Alzheimer’s disease in diverse electronic health records

  • By

  • Thai Tran

  • Mingzhou Fu

  • Jessica Fung

  • Sriram Sankararaman

  • David A. Elashoff

  • Keith Vossel

  • Timothy S. Chang

  • November 27, 2025

  • 0 min

Share

Equitable Prediction of Undiagnosed Alzheimer’s Disease Using Fair Positive Unlabeled Learning

Overview

This study developed a semi-supervised positive unlabeled learning (SSPUL) model with racial bias mitigation to predict undiagnosed Alzheimer’s Disease (AD) across diverse populations using electronic health records (EHR). SSPUL demonstrated superior sensitivity and fairness compared to supervised models, effectively addressing underdiagnosis in underrepresented racial and ethnic groups.

Background

Alzheimer’s Disease is the most common neurodegenerative disorder and a leading cause of death in older adults, with significant health and economic burdens. Early diagnosis is critical but is hindered by underdiagnosis, especially in minority populations such as non-Hispanic African Americans, Hispanic Latinos, and East Asians. Traditional diagnostic methods and Medicare claims data have limited sensitivity and exhibit disparities in detection rates across racial groups. Machine learning approaches using EHR data offer promise but often lack fairness considerations and do not fully leverage unlabeled data.

Data Highlights

MetricSSPUL RangeSupervised Baseline Range
Sensitivity0.77–0.810.39–0.53
Area Under Precision Recall Curve (AUCPR)0.81–0.870.3–0.7

Key Findings

  • SSPUL achieved higher sensitivity (0.77–0.81) and AUCPR (0.81–0.87) across non-Hispanic white, non-Hispanic African American, Hispanic Latino, and East Asian groups compared to supervised models.
  • SSPUL exhibited superior fairness, demonstrated by the lowest cumulative parity loss among evaluated models.
  • Shared and distinct neurological (e.g., memory loss) and non-neurological (e.g., decubitus ulcer) features were identified among labeled and unlabeled AD patients.
  • Polygenic risk scores were significantly higher in labeled and predicted positive patients than predicted negatives in non-Hispanic white, Hispanic Latino, and East Asian groups (p < 0.001), validating model predictions.
  • SSPUL leveraged unlabeled data to mitigate label bias inherent in EHRs, improving detection in underrepresented populations.

Clinical Implications

The SSPUL model can enhance early and equitable detection of undiagnosed AD across diverse racial and ethnic groups, addressing disparities in diagnosis. Incorporating unlabeled EHR data and bias mitigation strategies may improve clinical decision-making and resource allocation for underserved populations. This approach supports more inclusive screening and timely intervention strategies in clinical practice.

Conclusion

SSPUL with fairness considerations provides a robust and equitable method for predicting undiagnosed Alzheimer’s Disease from EHR data, outperforming traditional supervised models and reducing racial disparities in diagnosis. This methodology holds promise for improving AD detection and care in diverse populations.

References

  1. UCLA Health Study 2024 -- Equitable Prediction of Undiagnosed Alzheimer’s Disease Using Fair Positive Unlabeled Learning

Original Source(s)

Related Content