Can AI Extract Echo Report Data as Accurately as Expert Annotation?

In a study of 50 echocardiography reports, GPT-5 mini extracted 55 cardiovascular fields from free-text echocardiography reports with 92.5% exact-match agreement with expert annotation.

By
Kerri Miller
June 26, 2026
5 min

Conexiant

Objective:

To evaluate the accuracy of a large language model (GPT-5 mini) in extracting structured cardiovascular data from free-text echocardiography reports.

Approach:

Study Design: The study involved extracting 55 cardiovascular fields from de-identified reports in the MIMIC-III EchoNotes dataset, comparing model outputs to expert annotations.
Data Extraction: Fifty reports were annotated by a board-certified echocardiographer and extracted by GPT-5 mini, with a blinded cardiologist adjudicating discrepancies.

Key Findings:

The large language model achieved 92.5% exact-match agreement with expert annotation, with precision ranging from 96% to 98% across categories and recall ranging from 85% to 95%.
The model identified 120 additional clinical values not documented by human annotators, reflecting both over-extraction of normal findings and human annotation errors.

Interpretation:

The model showed strong performance in extracting echocardiography data, but over-extraction of normal findings was noted as a potential issue.

Limitations:

The study did not report on prospective clinical use, diagnostic accuracy, patient outcomes, or workflow-safety outcomes.
Performance varied across exam types, particularly with lower information density in stress echocardiograms.