Improving disease misclassification and prevalence estimates by linking primary and secondary care electronic health records: an illustration from arthritis research - Report - MDSpire
Advertisement
Improving disease misclassification and prevalence estimates by linking primary and secondary care electronic health records: an illustration from arthritis research
Enhancing Disease Classification Accuracy by Linking Primary and Secondary Care EHRs in Psoriatic Arthritis
Overview
This study demonstrated that linking primary care electronic health records (EHRs) with text-mined secondary care outpatient letters significantly improves the accuracy of psoriatic arthritis (PsA) prevalence estimates. Primary care data alone underestimated PsA prevalence by more than twofold due to false negatives, which were identified through secondary care data linkage.
Background
Accurate disease classification in routinely collected EHR data is essential for reliable prevalence estimates and research. Primary care databases often rely on coded diagnoses, which may contain false positives and fail to capture all true cases, leading to misclassification. Secondary care outpatient diagnoses are typically recorded as unstructured free text, limiting their use in validation. Advances in natural language processing enable extraction of diagnostic information from these texts, offering an opportunity to improve case identification and correct prevalence estimates.
Data Highlights
Metric
Value
95% CI
Primary care PsA cases identified
245
—
Primary care population
188,286 adults
—
Observed PsA prevalence (primary care only)
0.13%
0.11% - 0.15%
Subgroup attending hospital rheumatology clinic
7,532 patients
—
Primary care PsA codes in subgroup
202
—
True positives confirmed in subgroup
188
—
False positives in subgroup
14
—
False negatives (hospital-diagnosed, no primary care code)
196
—
Adjusted PsA prevalence (corrected for misclassification)
0.25%
0.21% - 0.28%
Key Findings
Primary care EHR data alone identified 245 PsA cases among 188,286 adults, yielding an observed prevalence of 0.13%.
In a subgroup of 7,532 patients attending hospital rheumatology clinics, 202 had a primary care PsA code; 14 of these were false positives upon validation.
Linkage with text-mined secondary care outpatient letters enabled identification of both false positives and false negatives.
Adjusting for misclassification using linked data doubled the estimated PsA prevalence to 0.25%.
Text mining of outpatient letters compensates for the lack of coded secondary care diagnoses in national datasets.
Clinical Implications
Clinicians and researchers should be aware that relying solely on primary care coded data may substantially underestimate disease prevalence due to false negatives. Integrating secondary care data, especially through text mining of outpatient letters, can improve case ascertainment and provide more accurate epidemiological estimates. This approach supports better disease surveillance and resource allocation in clinical practice.
Conclusion
Linking primary and secondary care EHRs with advanced text-mining techniques significantly enhances the accuracy of disease classification and prevalence estimates for psoriatic arthritis. This methodology addresses limitations of primary care coding and highlights the importance of comprehensive data integration in epidemiological research.
References
Study Authors/Journal/Year -- Enhancing Accuracy of Disease Classification and Prevalence Assessments through Integration of Primary and Secondary Care Electronic Health Records