Clinical evaluation of large language model recommendations in melanoma: comparison with multidisciplinary tumor board decisions in a real-world cohort - Takeaways - MDSpire

Clinical evaluation of large language model recommendations in melanoma: comparison with multidisciplinary tumor board decisions in a real-world cohort

  • By

  • Belma Babic

  • Sefika Umihanic

  • Hedim Osmanovic

  • Nejra Selak

  • Erna Sehic-Kozica

  • Lejla Moranjkic

  • Inga Marijanovic

  • Marija Karaga

  • Amina Jalovcic Suljevic

  • Sekib Umihanic

  • Fadil Umihanic

  • Arzumana Ozegovic-Orucevic

  • June 29, 2026

  • 0 min

Share

  • 1

    The study evaluated four large language models (LLMs) for melanoma treatment recommendations against a multidisciplinary tumor board's decisions.

  • 2

    151 patients with newly diagnosed cutaneous melanoma were included in the retrospective study conducted at a single center in Bosnia and Herzegovina.

  • 3

    ChatGPT-5 Thinking demonstrated the strongest performance among the LLMs, followed by ChatGPT-4o, while Gemini 2.5 Pro and DeepSeek-V3.2 were rated lower.

  • 4

    Statistically significant differences in performance between LLMs were observed across all evaluated domains, particularly in complex treatment scenarios.

  • 5

    The study suggests that while LLMs may support melanoma MDT practice, they should not replace independent treatment decisions without further research.

Original Source(s)

Related Content