Information extraction from weakly structured radiological reports with natural language queries - Report - MDSpire

Information extraction from weakly structured radiological reports with natural language queries

  • By

  • Amin Dada

  • Tim Leon Ufer

  • Moon Kim

  • Max Hasin

  • Nicola Spieker

  • Michael Forsting

  • Felix Nensa

  • Jan Egger

  • Jens Kleesiek

  • July 28, 2023

  • 0 min

Share

Extracting Insights from Loosely Structured Radiology Reports Using Natural Language Queries

Overview

This study evaluates German BERT models pre-trained on a large corpus of radiology reports to extract information via reading comprehension question answering (RCQA). Using 857,783 reports for pre-training and 1,223 annotated brain CT reports for fine-tuning, the approach overcomes limitations of fixed classification or named entity recognition methods by enabling flexible, clinically relevant queries.

Background

Radiology reports are critical for clinical decision-making but are often weakly structured and vary widely in style, complicating the comparison of findings over time. Traditional methods using classification or named entity recognition are limited by predefined categories and lack positional information within texts. Recent advances in natural language processing, particularly transformer-based models like BERT, have shown promise in extracting meaningful information from clinical texts. This study builds on these advances by developing and fine-tuning German BERT models specifically for radiology report question answering to improve information accessibility.

Data Highlights

DatasetNumber of ReportsWordsTokensModality
Radiology Reports (Essen University Hospital)857,78392 million227 millionCT (70%), MRI (30%)
DocCheck Flexikon Medical Encyclopedia14,825 articles3.7 million7.6 millionVarious specialties
Annotated Brain CT Reports (Fine-tuning)1,223Not specifiedNot specifiedBrain CT

Key Findings

  • Radiology reports are weakly structured and vary significantly in style and terminology, complicating information extraction.
  • Previous NLP approaches using classification or named entity recognition are limited by fixed categories and lack spatial context.
  • Transformer-based models, especially BERT, outperform traditional NLP methods in radiology text analysis across multiple languages.
  • The study utilized a large corpus of 857,783 German radiology reports for unsupervised pre-training of BERT models.
  • Fine-tuning was performed on a manually annotated RCQA dataset of 1,223 brain CT reports with questions formulated by trained medical staff and students.
  • The RCQA approach enables flexible, clinically relevant question answering without reliance on predefined categories or entity labels.

Clinical Implications

The use of BERT-based RCQA models can significantly reduce the effort required to extract relevant information from loosely structured radiology reports, facilitating longitudinal comparison of findings. This approach supports radiologists, referring physicians, and other healthcare providers by providing rapid access to precise information across multiple reports, potentially improving diagnostic accuracy and treatment decisions.

Conclusion

Pre-training and fine-tuning German BERT models on large radiology datasets enables effective natural language question answering, overcoming limitations of prior methods and enhancing clinical information retrieval from unstructured radiology reports.

References

  1. Bressem et al 2021 -- Evaluation of BERT models on radiology report classification
  2. Datta et al 2021 -- Spatial information extraction from radiology reports using BERT
  3. Wen et al 2020 -- BERT reading comprehension question answering on electronic medical records
  4. Liang et al 2022 -- Tumor progression detection using BERT models
  5. Fink et al 2023 -- Performance of BERT in radiology across languages

Original Source(s)

Related Content