Automated identification of fall-related injuries in unstructured clinical notes - Report - MDSpire

Automated identification of fall-related injuries in unstructured clinical notes

  • By

  • Wendong Ge

  • Lilian M Godeiro Coelho

  • Maria A Donahue

  • Hunter J Rice

  • Deborah Blacker

  • John Hsu

  • Joseph P Newhouse

  • Sonia Hernández-Díaz

  • Sebastien Haneuse

  • Brandon Westover

  • Lidia M V R Moura

  • July 26, 2024

  • 0 min

Share

Automated NLP Detection of Fall-Related Injuries in Clinical Notes

Overview

This study developed and validated natural language processing (NLP) models to accurately identify fall-related injuries (FRIs) from unstructured clinical documentation in older adults. Among five models tested, RoBERTa demonstrated superior performance with high precision, recall, and F1 scores, highlighting its potential to enhance large-scale clinical research on FRIs.

Background

Fall-related injuries are a leading cause of hospitalization and emergency visits among adults aged 65 and older, incurring substantial healthcare costs and impacting patient independence. Manual review of electronic health records (EHRs) to identify FRIs is labor-intensive and prone to error, motivating the use of automated natural language processing techniques. Previous approaches using support vector machines have been supplemented by advanced transformer-based models like BERT, which have shown promise in detecting various medical conditions from unstructured text. This study aimed to leverage these advancements to improve FRI identification in a large healthcare system's EHR data.

Data Highlights

MetricRoBERTa Performance95% Confidence Interval
Precision0.900.88 - 0.91
Recall0.910.90 - 0.93
F1 Score0.910.89 - 0.92
AUROC0.960.95 - 0.97
AUPR0.960.95 - 0.97

Key Findings

  • RoBERTa outperformed other NLP models including vanilla BERT, ClinicalBERT, DistilBERT, and SVM in detecting FRIs from clinical notes.
  • The model achieved high precision (0.90) and recall (0.91), indicating accurate and comprehensive identification of FRIs.
  • Training involved a three-stage process: masked language modeling, general boolean question-answering, and FRI-specific question-answering.
  • The study utilized a large dataset of 154,949 paragraphs containing FRI-related keywords from 1,669 patients aged 65 and older.
  • Expert manual labeling of 5,000 paragraphs and validated pattern annotations enabled robust benchmark and validated-standard labels for model training and testing.

Clinical Implications

The implementation of advanced NLP models like RoBERTa can significantly streamline the identification of fall-related injuries in unstructured clinical documentation, reducing reliance on manual chart review. This automation facilitates more efficient and accurate large-scale epidemiological studies and quality improvement initiatives targeting fall prevention in older adults. Clinicians and researchers may leverage these tools to better monitor and address FRIs within healthcare systems.

Conclusion

RoBERTa-based NLP models provide a reliable and efficient method for detecting fall-related injuries in unstructured clinical notes, offering a valuable tool to enhance clinical research and potentially improve patient care outcomes related to falls in older adults.

References

  1. Mass General Brigham Study 2024 -- Automated Detection of Fall-Associated Injuries in Unstructured Clinical Documentation

Original Source(s)

Related Content