GPT-4.1 and Llama 3.3 70 fail to detect clinically relevant errors in radiology reports in zero-shot evaluation - Quiz - MDSpire

GPT-4.1 and Llama 3.3 70 fail to detect clinically relevant errors in radiology reports in zero-shot evaluation

  • By

  • Tugba Akinci D’Antonoli

  • Lisa C. Adams

  • Jannik Lübberstedt

  • Markus M. Graf

  • Christian J. Mertens

  • Felix Busch

  • Sebastian Ziegelmayer

  • Marcus R. Makowski

  • Keno Bressem

  • Ina Luiken

  • June 19, 2026

  • 0 min

Share

Original Source(s)

Related Content