Clinical evaluation of large language model recommendations in melanoma: comparison with multidisciplinary tumor board decisions in a real-world cohort - Summary - MDSpire

Clinical evaluation of large language model recommendations in melanoma: comparison with multidisciplinary tumor board decisions in a real-world cohort

  • By

  • Belma Babic

  • Sefika Umihanic

  • Hedim Osmanovic

  • Nejra Selak

  • Erna Sehic-Kozica

  • Lejla Moranjkic

  • Inga Marijanovic

  • Marija Karaga

  • Amina Jalovcic Suljevic

  • Sekib Umihanic

  • Fadil Umihanic

  • Arzumana Ozegovic-Orucevic

  • June 29, 2026

  • 0 min

Share

Objective:

To evaluate the performance of four large language models (LLMs) in generating melanoma treatment recommendations compared to real-world decisions made by a multidisciplinary tumor board (MDT).

Approach:
  • Study Design: Retrospective single-center study involving 151 patients with newly diagnosed cutaneous melanoma discussed at the MDT.
  • LLM Evaluation: Recommendations from four LLMs (ChatGPT-4o, ChatGPT-5 Thinking, Gemini 2.5 Pro, DeepSeek-V3.2) were compared against actual MDT decisions by four board-certified oncologists.
  • Rating Domains: LLM-generated recommendations were rated on clarity, clinical applicability, coverage, explanation and support with evidence, and guideline concordance.
Key Findings:
  • Inter-rater reliability among oncologists was acceptable to good.
  • ChatGPT-5 Thinking demonstrated the strongest overall performance among the LLMs.
  • Statistically significant performance differences were observed across all evaluated domains.
  • Performance differences were most clinically relevant in complex treatment scenarios.
Interpretation:

Selected LLMs may support melanoma MDT practice in resource-limited settings.

Limitations:
  • The study is retrospective and conducted at a single center.
  • Further prospective studies are needed to validate LLM-assisted treatment recommendations.
Conclusion:

While LLMs show potential as supportive tools in melanoma treatment decision-making.

Original Source(s)

Related Content