To evaluate the accuracy of artificial intelligence systems in assigning CT and MRI examination protocols.
Key Findings:
Overall accuracy of AI in protocoling is about 85%.
Accuracy for traditional machine learning models is 83%; for transformer-based models, it is 87%; and for large language models, it is 86%.
The highest-performing model, BioBERT, achieved an accuracy of 93%.
Common sources of protocoling errors include ambiguous requisition text and data imbalance.
Interpretation:
AI tools show strong potential to streamline radiology workflows, especially through hybrid systems that combine AI and radiologist review.
Limitations:
Ambiguous or incomplete requisition text can lead to incorrect protocol selection.
Models trained on imbalanced datasets may perform poorly on rare protocol categories.
Some AI errors reflect clinically acceptable alternatives rather than clear mistakes.
Conclusion:
Current AI performance levels suggest they could enhance radiology workflows, with future research needed on clinical trials and fine-tuning of large language models.
Radiologists assigned to receive step-by-step explanations from a large language model achieved higher diagnostic accuracy in a randomized vignette study, while differential-diagnosis outputs may have increased inappropriate reliance on incorrect model suggestions.