Clinical Report: Information Content Screening Identifies Rare Diseases in EHR
Overview
This study demonstrates that information content (IC) metrics applied to SNOMED CT-coded electronic health records can effectively screen for rare disease (RD) patients. Using a large Singapore health system dataset, the method achieved approximately 95% sensitivity and identified underdiagnosed rare diseases, highlighting its potential for early RD detection.
Background
Rare diseases affect an estimated 3.5–5.9% of the global population but often remain undiagnosed for 4–7 years, causing prolonged patient distress and delayed care. Traditional coding systems like ICD-10 inadequately capture RD, leading to fragmented and incomplete health records. SNOMED CT offers greater granularity and, combined with information-theoretic approaches such as information content, may enable earlier identification of RD patients within electronic health records.
Data Highlights
Metric
Value
Patient cohort size
1,274,199
Unique SNOMED terms
35,898
Rare disease patients identified
17,575
Sensitivity of IC screening
~95%
Precision at 3 encounters (IC threshold 8.17)
20%
Underdiagnosed rare diseases surfaced
71 (57 genetic origin)
Key Findings
Information content (IC) effectively distinguishes rare disease patient profiles from the first clinical encounter using SNOMED CT data.
The screening method achieves approximately 95% sensitivity, enabling early detection of RD candidates.
Precision is maintained at a reasonable level (~20%) starting from three clinical encounters with an IC threshold of 8.17.
The approach identified 71 underdiagnosed rare diseases in the population, with the majority being genetic in origin.
This is the first known application of information-theoretic metrics to EHR data for rare disease screening at a health system scale.
Clinical Implications
Implementing IC-based screening within EHR systems can facilitate earlier identification of patients likely to have rare diseases, potentially shortening the diagnostic odyssey. The method’s simplicity and reliance on existing SNOMED CT coding make it feasible for integration into routine clinical workflows, supporting targeted follow-up and resource allocation. This approach may improve equity in rare disease diagnosis by surfacing cases that traditional coding and referral pathways miss.
Conclusion
Information content metrics applied to SNOMED CT-coded EHR data represent a promising screening tool for rare diseases, enabling high sensitivity detection and uncovering underdiagnosed conditions. This method offers a scalable strategy to enhance rare disease identification and improve patient outcomes at both hospital and health system levels.
References
Study Authors/2023 -- Utilizing Information Content as a Screening Mechanism for Identifying Rare Diseases in Health Systems