Sepsis and septic shock case identification from electronic health records: an open-source workflow and comparison of cohorts by criteria - Report - MDSpire

Sepsis and septic shock case identification from electronic health records: an open-source workflow and comparison of cohorts by criteria

  • By

  • Seth R. Bauer

  • Lyla Mourany

  • Paul R. Gunsalus

  • Alex Milinovich

  • Sandra L. Kane-Gill

  • Xiaofeng Wang

  • Yasir Tarabichi

  • Vidula Vachharajani

  • Jarrod E. Dalton

  • December 12, 2025

  • 0 min

Share

Open-Source Workflow for Sepsis and Septic Shock Identification Using EHR Data

Overview

This study developed and openly shared a comprehensive data workflow to identify sepsis and septic shock cases from electronic health records (EHR) using CDC Adult Sepsis Event (ASE), Sepsis-3 clinical criteria, and ICD codes. The workflow was applied to over a decade of patient encounters across 25 hospitals, enabling transparent comparison of cohorts identified by different criteria.

Background

Sepsis identification is critical for performance improvement programs as recommended by the 2021 Surviving Sepsis Campaign Guidelines. Multiple clinical criteria exist, including CDC ASE and Sepsis-3, but consensus on the optimal approach is lacking. ICD coding is commonly used but has limitations due to documentation and coding biases. Implementing reproducible, transparent workflows for sepsis case identification from EHR data remains challenging, with few published workflows providing complete data extraction and transformation code.

Data Highlights

The study queried patient encounters from 2012 to 2024 across 25 emergency departments and hospitals within the Cleveland Clinic integrated health system. Inclusion criteria required at least one microbial culture and one antimicrobial dose per encounter. Data extraction yielded between 37,404 and 134,853,957 observations across various query outputs. The workflow incorporated data ingestion, transformation, cleaning, and storage in a relational database using R software and Quarto markdown, with all code publicly available for reproducibility.

Key Findings

  • An open-source data workflow was developed encompassing data extraction, transformation, and application of CDC ASE, Sepsis-3, and ICD criteria for sepsis and septic shock identification.
  • The workflow was applied to a large, multi-hospital EHR dataset spanning over a decade, ensuring broad applicability.
  • Detailed annotated programming code and methodological descriptions were provided to enhance transparency and reproducibility.
  • Comparison of cohorts identified by CDC ASE, Sepsis-3, and ICD criteria highlighted differences in case identification approaches.
  • The study addressed common gaps in prior literature by including data extraction code and clarifying ambiguous criteria components.

Clinical Implications

Clinicians and researchers can utilize the openly available workflow to reliably identify sepsis and septic shock cases from EHR data, facilitating quality improvement and research. Transparent and reproducible methods reduce variability in case identification, improving comparability across studies and healthcare systems. Adoption of such workflows supports adherence to sepsis performance improvement recommendations and enhances surveillance accuracy.

Conclusion

This work provides a rigorously developed, fully transparent, and publicly accessible workflow for sepsis and septic shock case identification from EHR data, enabling reproducible research and quality improvement efforts. The comparative analysis underscores the importance of standardized criteria application in sepsis surveillance.

References

  1. Surviving Sepsis Campaign Guidelines 2021 -- Performance Improvement Program for Sepsis
  2. CDC Adult Sepsis Event Criteria -- Sepsis Surveillance Recommendations
  3. Sepsis-3 Clinical Criteria -- Third International Consensus Definition for Sepsis and Septic Shock
  4. OpenSep Pipeline -- Open-Source Sepsis-3 Identification Workflow

Original Source(s)

Related Content