To improve the prediction of undiagnosed Alzheimer's Disease (AD) using semi-supervised positive unlabeled learning (SSPUL), a method that learns from both labeled and unlabeled data, while addressing racial bias in diverse populations.
Key Findings:
SSPUL achieved sensitivity of 0.77–0.81 and AUCPR of 0.81–0.87, outperforming supervised baseline models (sensitivity: 0.39–0.53; AUCPR: 0.3–0.7).
SSPUL demonstrated superior fairness with the lowest cumulative parity loss.
Identified shared and distinct features among labeled and unlabeled AD patients, including neurological (e.g., memory loss) and non-neurological indicators (e.g., decubitus ulcer).
Polygenic risk scores were significantly higher in labeled and predicted positives compared to predicted negatives among non-Hispanic white, Hispanic Latino, and East Asian groups (p < 0.001).
Interpretation:
The findings suggest that SSPUL can significantly enhance the prediction of undiagnosed AD while promoting fairness across diverse racial and ethnic groups, potentially leading to improved diagnostic practices.
Limitations:
The study may be limited by the quality and completeness of electronic health records, which can vary widely.
Potential biases in the underlying data, such as historical disparities in healthcare access, may still affect the results despite mitigation efforts.
Conclusion:
SSPUL represents a promising approach to improve the equitable prediction of undiagnosed Alzheimer's Disease, addressing both sensitivity and fairness in diverse populations.