Improving disease misclassification and prevalence estimates by linking primary and secondary care electronic health records: an illustration from arthritis research - Scorecard - MDSpire

Improving disease misclassification and prevalence estimates by linking primary and secondary care electronic health records: an illustration from arthritis research

  • By

  • Belay Birlie Yimer

  • Fangyuan Zhang

  • Jenny Humphreys

  • Mark Lunt

  • Meghna Jani

  • John McBeth

  • William G Dixon

  • September 17, 2025

  • 0 min

Share

Clinical Scorecard: Enhancing Accuracy of Disease Classification and Prevalence Assessments through Integration of Primary and Secondary Care Electronic Health Records: A Case Study in Arthritis Research

At a Glance

CategoryDetail
ConditionPsoriatic arthritis (PsA)
Key MechanismsLinkage of primary care EHR data with text-mined secondary care outpatient letters to identify and correct misclassification (false positives and false negatives) in disease coding
Target PopulationAdults aged ≥18 years registered with primary care practices in Northwest England
Care SettingPrimary care and hospital rheumatology outpatient clinics

Key Highlights

  • Primary care codes alone underestimated PsA prevalence by more than twofold due to false negatives.
  • Text mining of hospital outpatient letters enabled identification of true diagnoses absent from primary care coding.
  • Linking primary and secondary care data allowed adjustment of PsA prevalence estimates from 0.13% to 0.25%.

Guideline-Based Recommendations

Diagnosis

  • Use validated code lists and algorithms for disease classification in EHR data to minimize false positives and false negatives.
  • Link primary care data with secondary care records, including text-mined outpatient letters, to improve diagnostic accuracy.
  • Consider natural language processing techniques to extract diagnoses from unstructured secondary care data.

Management

  • Incorporate comprehensive data sources for accurate disease identification to inform clinical decision-making and resource allocation.

Monitoring & Follow-up

  • Regularly validate and update coding algorithms against secondary care data to detect and correct misclassification.
  • Monitor prevalence estimates for changes after data linkage and validation.

Risks

  • Relying solely on primary care coding may lead to underestimation of disease prevalence and misinformed clinical and research conclusions.
  • Resource-intensive manual review of records may be required without automated text-mining approaches.

Patient & Prescribing Data

Adults with psoriatic arthritis identified via linked primary and secondary care EHR data

Accurate identification of PsA cases through data linkage can improve understanding of treatment patterns and outcomes by ensuring correct case ascertainment.

Clinical Best Practices

  • Validate EHR code lists with multiple data sources to reduce misclassification bias.
  • Utilize natural language processing to extract diagnostic information from unstructured secondary care records.
  • Adjust prevalence and epidemiological estimates based on validation findings to reflect true disease burden.
  • Adhere to STROBE and RECORD guidelines for transparent reporting of observational studies using routinely collected health data.

References

Original Source(s)

Related Content