Which current chatbot is more competent in urological theoretical knowledge? A comparative analysis by the European board of urology in-service assessment - Report - MDSpire

Which current chatbot is more competent in urological theoretical knowledge? A comparative analysis by the European board of urology in-service assessment

  • By

  • Mehmet Fatih Şahin

  • Çağrı Doğan

  • Erdem Can Topkaç

  • Serkan Şeramet

  • Furkan Batuhan Tuncer

  • Cenk Murat Yazıcı

  • February 11, 2025

  • 0 min

Share

Clinical Report: Comparative Performance of Chatbots on European Board of Urology Exam

Overview

This study evaluated five advanced chatbots on the European Board of Urology In-Service Assessment questions, revealing Copilot Pro as the top performer with a 71.6% success rate. GPT-4o and Gemini Advanced followed, demonstrating varying strengths across urological subtopics. The findings highlight current AI capabilities in theoretical urological knowledge assessment.

Background

Chatbots powered by large language models are increasingly used by patients and clinicians for medical information retrieval. While these AI systems can access vast data, clinical success also depends on interpreting complex patient information, a skill traditionally unique to humans. The European Board of Urology (EBU) In-Service Assessment (ISA) provides a standardized, high-level test of urological knowledge, making it an ideal benchmark to evaluate chatbot proficiency. This study aimed to compare the performance of five licensed chatbots on EBU ISA questions to assess their theoretical knowledge and interpretative abilities in urology.

Data Highlights

ChatbotOverall Success Rate (%)Exam 1 (%)Exam 2 (%)Exam 3 (%)Top Subtopic Performance (%)
Copilot Pro71.6Not specified100 (Transplantation, Exam 2)Not specifiedTransplantation/Nephrology 77.8, Pediatrics/Congenital 75.0, Andrology/Infertility 72.7
GPT-4o65.871.4Not specified56.5Lithiasis/Infections 73.7, Miscellaneous 73.0
Gemini Advanced68.5Not specifiedNot specifiedNot specifiedMiscellaneous 81.1

Key Findings

  • Copilot Pro achieved the highest overall success rate of 71.6%, passing all three exams and excelling in transplantation/nephrology with a perfect 100% score in Exam 2.
  • GPT-4o passed all exams with a 65.8% overall success rate, performing best in lithiasis/infections and miscellaneous categories but showing lower accuracy in trauma/emergency and transplantation/nephrology.
  • Gemini Advanced ranked second-best overall with 68.5%, notably achieving the highest score in the miscellaneous subtopic (81.1%).
  • Performance varied significantly across urological subtopics, with trauma/emergency and transplantation/nephrology being challenging areas for most chatbots.
  • The study utilized 596 multiple-choice questions from three EBU ISA exams, ensuring a comprehensive assessment of theoretical knowledge aligned with current EAU guidelines.

Clinical Implications

These findings suggest that advanced chatbots, particularly Copilot Pro, can reliably assist clinicians and trainees in accessing and reviewing urological knowledge based on standardized European guidelines. However, variability in performance across subtopics indicates that AI support should complement, not replace, expert clinical judgment, especially in complex or interpretative scenarios. Continuous updates and training of AI models are essential to improve their utility in clinical education and decision support.

Conclusion

Current state-of-the-art chatbots demonstrate promising proficiency in answering urological board exam questions, with Copilot Pro leading in overall accuracy. While AI can enhance knowledge acquisition, human expertise remains crucial for nuanced clinical interpretation.

References

  1. European Board of Urology In-Service Assessment Data and Chatbot Evaluation Study 2024

Original Source(s)

Related Content