Extracting Social Determinants of Health From Electronic Health Records: Development and Comparison of Rule-Based and Large Language Model Methods - Report - MDSpire

Extracting Social Determinants of Health From Electronic Health Records: Development and Comparison of Rule-Based and Large Language Model Methods

  • By

  • Bo Wang

  • Dia Kabir

  • Cheryl Renee Clark

  • Karmel W Choi

  • Jordan W Smoller

  • May 19, 2026

Share

Clinical Report: Analyzing Social Determinants of Health in EHRs

Overview

This study evaluates methods for extracting social determinants of health (SDoH) from electronic health records (EHRs) using rule-based and large language model (LLM) techniques. The findings highlight the potential of LLMs to improve the identification of underexplored SDoH domains with minimal training requirements.

Background

Social determinants of health (SDoH) significantly impact health outcomes and disparities, accounting for a substantial portion of health outcomes. Despite the potential of electronic health records (EHRs) to provide valuable data, SDoH information is often underdocumented. This study addresses the need for effective extraction methods to enhance the utility of EHRs in understanding and addressing health disparities.

Data Highlights

No numerical data available in the source material.

Key Findings

  • Developed methods to identify seven SDoH domains from clinical text.
  • Emphasized less-explored SDoH factors such as social resources and health insurance status.
  • Introduced a fine-grained classification system for SDoH, including subcategories for health insurance coverage.
  • Utilized rule-based prescreening in conjunction with LLMs for improved SDoH extraction.
  • Highlighted the importance of contextual attributes in defining positive SDoH cases.

Clinical Implications

The study suggests that integrating LLMs with rule-based systems can enhance the extraction of SDoH from clinical notes, potentially improving risk stratification and clinical decision-making. Healthcare providers should consider adopting these methods to better capture and address social needs in patient care.

Conclusion

The findings underscore the promise of LLMs in extracting critical SDoH information from EHRs, paving the way for improved population health research and clinical applications.

Related Resources & Content

  1. Journal of Medical Internet Research (JMIR), 2026 -- Automated Identification of Nursing Diagnoses and Interventions From Nursing Records Using a Retrieval-Augmented Large Language Model Approach: Quantitative Study
  2. aace endocrine ai, 2026 -- Selective LLM use may improve electronic health record phenotyping accuracy
  3. npj Digital Medicine, 2026 -- Enhanced Transferability of Predictions from Electronic Health Records Across Different Countries and Coding Frameworks Using Large Language Models
  4. npj Digital Medicine, 2025 -- Leveraging large language models to extract smoking history from clinical notes for lung cancer surveillance
  5. HTI-1 Final Rule - ONC - Office of the National Coordinator for Health Information Technology
  6. npj Digital Medicine, 2023 -- Large language models to identify social determinants of health in electronic health records
  7. CMS, 2024 -- CMS’ Accountable Health Communities (AHC) Model Connects People to Community Resources and Generates $200 Million in Savings
  8. HTI-1 Final Rule - ONC - Office of the National Coordinator for Health Information Technology
  9. Large language models to identify social determinants of health in electronic health records | npj Digital Medicine
  10. CMS’ Accountable Health Communities (AHC) Model Connects People to Community Resources and Generates $200 Million in Savings | CMS

Original Source(s)

Related Content