A weakly supervised transformer for rare disease diagnosis and subphenotyping from EHRs with pulmonary case studies - Scorecard - MDSpire

A weakly supervised transformer for rare disease diagnosis and subphenotyping from EHRs with pulmonary case studies

  • By

  • Kimberly F. Greco

  • Zongxin Yang

  • Mengyan Li

  • Han Tong

  • Sara Morini Sweet

  • Alon Geva

  • Kenneth D. Mandl

  • Benjamin A. Raby

  • Tianxi Cai

  • February 6, 2026

  • 0 min

Share

Clinical Scorecard: A semi-supervised transformer model for diagnosing rare diseases and subphenotyping using electronic health records: Insights from pulmonary case studies

At a Glance

CategoryDetail
ConditionRare diseases, specifically rare pulmonary conditions
Key MechanismsWeakly supervised transformer model (WEST) leveraging limited expert-labeled data and extensive probabilistic silver-standard labels from EHRs for diagnosis and subphenotyping
Target PopulationPatients with rare pulmonary diseases represented in electronic health records
Care SettingClinical settings utilizing electronic health records, exemplified by Boston Children’s Hospital data

Key Highlights

  • Rare diseases are underdiagnosed due to low prevalence and limited clinician familiarity.
  • WEST model integrates expert-validated and probabilistic labels from EHRs to improve rare disease diagnosis and subphenotyping.
  • WEST outperforms existing methods in phenotype classification, subphenotype identification, and disease progression prediction.

Guideline-Based Recommendations

Diagnosis

  • Utilize computational phenotyping approaches to enhance detection of rare diseases from EHR data.
  • Incorporate both expert-labeled and silver-standard probabilistic labels to improve diagnostic model calibration.

Management

  • Leverage model-derived subphenotypes to inform personalized disease management strategies.

Monitoring & Follow-up

  • Apply predictive modeling to anticipate disease progression using longitudinal EHR data.

Risks

  • Be aware of limitations due to noisy or incomplete labels derived from EHRs.
  • Consider privacy and data use agreements restricting access to patient-level EHR data.

Patient & Prescribing Data

Patients with rare pulmonary diseases identified through EHR data at Boston Children’s Hospital

WEST enables label-efficient learning reducing manual annotation burden, supporting accurate diagnosis and revealing clinically relevant subphenotypes that may guide treatment decisions.

Clinical Best Practices

  • Combine limited high-quality expert labels with extensive silver-standard labels for model training to optimize diagnostic accuracy.
  • Iteratively refine probabilistic labels during model training to improve calibration and performance.
  • Use computational phenotyping models to uncover deeper clinical insights from routine EHR data beyond traditional manual chart review.

References

Original Source(s)

Related Content