A weakly supervised transformer for rare disease diagnosis and subphenotyping from EHRs with pulmonary case studies

By
Kimberly F. Greco
Zongxin Yang
Mengyan Li
Han Tong
Sara Morini Sweet
Alon Geva
Kenneth D. Mandl
Benjamin A. Raby
Tianxi Cai
February 6, 2026

Npj Digital Medicine

At a Glance

Category	Detail
Condition	Rare diseases, specifically rare pulmonary conditions
Key Mechanisms	Weakly supervised transformer model (WEST) leveraging limited expert-labeled data and extensive probabilistic silver-standard labels from EHRs for diagnosis and subphenotyping
Target Population	Patients with rare pulmonary diseases represented in electronic health records
Care Setting	Clinical settings utilizing electronic health records, exemplified by Boston Children’s Hospital data

Key Highlights

Rare diseases are underdiagnosed due to low prevalence and limited clinician familiarity.
WEST model integrates expert-validated and probabilistic labels from EHRs to improve rare disease diagnosis and subphenotyping.
WEST outperforms existing methods in phenotype classification, subphenotype identification, and disease progression prediction.

Guideline-Based Recommendations

Diagnosis

Utilize computational phenotyping approaches to enhance detection of rare diseases from EHR data.
Incorporate both expert-labeled and silver-standard probabilistic labels to improve diagnostic model calibration.

Management

Leverage model-derived subphenotypes to inform personalized disease management strategies.

Monitoring & Follow-up

Apply predictive modeling to anticipate disease progression using longitudinal EHR data.

Risks

Be aware of limitations due to noisy or incomplete labels derived from EHRs.
Consider privacy and data use agreements restricting access to patient-level EHR data.

Patient & Prescribing Data

Patients with rare pulmonary diseases identified through EHR data at Boston Children’s Hospital

WEST enables label-efficient learning reducing manual annotation burden, supporting accurate diagnosis and revealing clinically relevant subphenotypes that may guide treatment decisions.

Clinical Best Practices

Combine limited high-quality expert labels with extensive silver-standard labels for model training to optimize diagnostic accuracy.
Iteratively refine probabilistic labels during model training to improve calibration and performance.
Use computational phenotyping models to uncover deeper clinical insights from routine EHR data beyond traditional manual chart review.

A weakly supervised transformer for rare disease diagnosis and subphenotyping from EHRs with pulmonary case studies

Clinical Scorecard: A semi-supervised transformer model for diagnosing rare diseases and subphenotyping using electronic health records: Insights from pulmonary case studies

At a Glance

Key Highlights

Guideline-Based Recommendations

Diagnosis

Management

Monitoring & Follow-up

Risks

Patient & Prescribing Data

Clinical Best Practices

References

Original Source(s)

A weakly supervised transformer for rare disease diagnosis and subphenotyping from EHRs with pulmonary case studies

Related Content

Psychological impact and quality of life in pediatric patients with chronic skin disorders: a systematic review (2010–2025)

Correction to: Antimicrobial Usage Among Acutely Ill Hospitalized Children Aged 2‒23 Months in Sub-Saharan Africa and South Asia

Agentic AI system may improve rare disease diagnosis