Information content as a health system screening tool for rare diseases - Report - MDSpire

Information content as a health system screening tool for rare diseases

  • By

  • Tudor Groza

  • Peter N. Robinson

  • Weng Khong Lim

  • Kaavya Narasimhalu

  • Jenny Hsieh

  • Khung Keong Yeo

  • Goh Bee Keow

  • Terrence Thomas

  • Tien Yin Wong

  • Neerja Karnani

  • Gareth Baynam

  • Saumya Shekhar Jamuar

  • November 25, 2025

  • 0 min

Share

Clinical Report: Information Content Screening Identifies Rare Diseases in EHR

Overview

This study demonstrates that information content (IC) metrics applied to SNOMED CT-coded electronic health records can effectively screen for rare disease (RD) patients. Using a large Singapore health system dataset, the method achieved approximately 95% sensitivity and identified underdiagnosed rare diseases, highlighting its potential for early RD detection.

Background

Rare diseases affect an estimated 3.5–5.9% of the global population but often remain undiagnosed for 4–7 years, causing prolonged patient distress and delayed care. Traditional coding systems like ICD-10 inadequately capture RD, leading to fragmented and incomplete health records. SNOMED CT offers greater granularity and, combined with information-theoretic approaches such as information content, may enable earlier identification of RD patients within electronic health records.

Data Highlights

MetricValue
Patient cohort size1,274,199
Unique SNOMED terms35,898
Rare disease patients identified17,575
Sensitivity of IC screening~95%
Precision at 3 encounters (IC threshold 8.17)20%
Underdiagnosed rare diseases surfaced71 (57 genetic origin)

Key Findings

  • Information content (IC) effectively distinguishes rare disease patient profiles from the first clinical encounter using SNOMED CT data.
  • The screening method achieves approximately 95% sensitivity, enabling early detection of RD candidates.
  • Precision is maintained at a reasonable level (~20%) starting from three clinical encounters with an IC threshold of 8.17.
  • The approach identified 71 underdiagnosed rare diseases in the population, with the majority being genetic in origin.
  • This is the first known application of information-theoretic metrics to EHR data for rare disease screening at a health system scale.

Clinical Implications

Implementing IC-based screening within EHR systems can facilitate earlier identification of patients likely to have rare diseases, potentially shortening the diagnostic odyssey. The method’s simplicity and reliance on existing SNOMED CT coding make it feasible for integration into routine clinical workflows, supporting targeted follow-up and resource allocation. This approach may improve equity in rare disease diagnosis by surfacing cases that traditional coding and referral pathways miss.

Conclusion

Information content metrics applied to SNOMED CT-coded EHR data represent a promising screening tool for rare diseases, enabling high sensitivity detection and uncovering underdiagnosed conditions. This method offers a scalable strategy to enhance rare disease identification and improve patient outcomes at both hospital and health system levels.

References

  1. Study Authors/2023 -- Utilizing Information Content as a Screening Mechanism for Identifying Rare Diseases in Health Systems

Original Source(s)

Related Content