Clinical Scorecard: A semi-supervised transformer model for diagnosing rare diseases and subphenotyping using electronic health records: Insights from pulmonary case studies
Weakly supervised transformer model (WEST) leveraging limited expert-labeled data and extensive probabilistic silver-standard labels from EHRs for diagnosis and subphenotyping
Target Population
Patients with rare pulmonary diseases represented in electronic health records
Care Setting
Clinical settings utilizing electronic health records, exemplified by Boston Children’s Hospital data
Key Highlights
Rare diseases are underdiagnosed due to low prevalence and limited clinician familiarity.
WEST model integrates expert-validated and probabilistic labels from EHRs to improve rare disease diagnosis and subphenotyping.
WEST outperforms existing methods in phenotype classification, subphenotype identification, and disease progression prediction.
Guideline-Based Recommendations
Diagnosis
Utilize computational phenotyping approaches to enhance detection of rare diseases from EHR data.
Incorporate both expert-labeled and silver-standard probabilistic labels to improve diagnostic model calibration.
Management
Leverage model-derived subphenotypes to inform personalized disease management strategies.
Monitoring & Follow-up
Apply predictive modeling to anticipate disease progression using longitudinal EHR data.
Risks
Be aware of limitations due to noisy or incomplete labels derived from EHRs.
Consider privacy and data use agreements restricting access to patient-level EHR data.
Patient & Prescribing Data
Patients with rare pulmonary diseases identified through EHR data at Boston Children’s Hospital
WEST enables label-efficient learning reducing manual annotation burden, supporting accurate diagnosis and revealing clinically relevant subphenotypes that may guide treatment decisions.
Clinical Best Practices
Combine limited high-quality expert labels with extensive silver-standard labels for model training to optimize diagnostic accuracy.
Iteratively refine probabilistic labels during model training to improve calibration and performance.
Use computational phenotyping models to uncover deeper clinical insights from routine EHR data beyond traditional manual chart review.