GPT-4.1 and Llama 3.3 70 fail to detect clinically relevant errors in radiology reports in zero-shot evaluation - Quiz - MDSpire

GPT-4.1 and Llama 3.3 70 fail to detect clinically relevant errors in radiology reports in zero-shot evaluation

By
Tugba Akinci D’Antonoli
Lisa C. Adams
Jannik Lübberstedt
Markus M. Graf
Christian J. Mertens
Felix Busch
Sebastian Ziegelmayer
Marcus R. Makowski
Keno Bressem
Ina Luiken
June 19, 2026
0 min

European Radiology

Share

Original Source(s)

European Radiology

GPT-4.1 and Llama 3.3 70 fail to detect clinically relevant errors in radiology reports in zero-shot evaluation

by Tugba Akinci D’Antonoli, Lisa C. Adams, Jannik Lübberstedt, Markus M. Graf, Christian J. Mertens, Felix Busch, Sebastian Ziegelmayer, Marcus R. Makowski, Keno Bressem, Ina Luiken
June 19, 2026

Related Content

Frontiers In Medicine

A rare case of intramuscular granular cell tumor in the right thigh: case report and literature review

by Jue Hou, Yifeng Zheng
June 23, 2026

Frontiers In Neurology

Multimodal ultrasound-based morphological differences between symptomatic and asymptomatic carotid web

by Chenyang Dai, Shihao Ruan, Linlin Li, Yuanyuan Tang, Lu Wang, Kai Wang
June 24, 2026

Frontiers In Oncology

Explainable incremental-value analysis of apparent diffusion coefficient and arterial spin labeling radiomics for ATRX status prediction in glioblastoma