Radiologists Tested on AI X-Rays

Study found synthetic radiographs were often difficult to distinguish from real images, with physicians identifying them correctly about 75% of the time.

By
Doug Brunk
April 1, 2026
4 min

Conexiant

Objective:

To evaluate radiologists' ability to distinguish between AI-generated and real radiographs, highlighting the implications for clinical practice.

Approach:

Key Findings:

Radiologists achieved 75% accuracy in distinguishing synthetic from real radiographs generated by GPT-4o.
Diagnostic accuracy for identifying abnormalities was high for both image types, reaching 92% for synthetic and 91% for real images.
Experience did not significantly affect performance, but musculoskeletal radiologists performed better (83% accuracy).
None of the tested LLMs identified all synthetic radiographs, with GPT-4o achieving 85% accuracy.

Interpretation:

The study highlights the challenges in detecting increasingly realistic synthetic medical images and underscores the urgent need for improved training and safeguards in clinical practice.

Limitations:

Relatively small data set and exclusion of obvious AI errors may have hindered detection, potentially skewing results.
Equal proportion of synthetic images does not reflect real-world prevalence, which could further impact detection accuracy.
Potential bias from using GPT-4o for both generation and detection raises questions about the validity of the findings.

Conclusion:

The findings underscore the risks of synthetic medical images in clinical settings and suggest the need for strategies like watermarking, provenance tracking, and automated detection tools to enhance safety.