Clinical evaluation of large language model recommendations in melanoma: comparison with multidisciplinary tumor board decisions in a real-world cohort

By
Belma Babic
Sefika Umihanic
Hedim Osmanovic
Nejra Selak
Erna Sehic-Kozica
Lejla Moranjkic
Inga Marijanovic
Marija Karaga
Amina Jalovcic Suljevic
Sekib Umihanic
Fadil Umihanic
Arzumana Ozegovic-Orucevic
June 29, 2026
0 min

Frontiers In Oncology

Objective:

To evaluate the performance of four large language models (LLMs) in generating melanoma treatment recommendations compared to real-world decisions made by a multidisciplinary tumor board (MDT).

Approach:

Study Design: Retrospective single-center study involving 151 patients with newly diagnosed cutaneous melanoma discussed at the MDT.
LLM Evaluation: Recommendations from four LLMs (ChatGPT-4o, ChatGPT-5 Thinking, Gemini 2.5 Pro, DeepSeek-V3.2) were compared against actual MDT decisions by four board-certified oncologists.
Rating Domains: LLM-generated recommendations were rated on clarity, clinical applicability, coverage, explanation and support with evidence, and guideline concordance.

Key Findings:

Inter-rater reliability among oncologists was acceptable to good.
ChatGPT-5 Thinking demonstrated the strongest overall performance among the LLMs.
Statistically significant performance differences were observed across all evaluated domains.
Performance differences were most clinically relevant in complex treatment scenarios.

Interpretation:

Selected LLMs may support melanoma MDT practice in resource-limited settings.

Limitations:

The study is retrospective and conducted at a single center.
Further prospective studies are needed to validate LLM-assisted treatment recommendations.

Conclusion:

While LLMs show potential as supportive tools in melanoma treatment decision-making.

Clinical evaluation of large language model recommendations in melanoma: comparison with multidisciplinary tumor board decisions in a real-world cohort

Objective:

Approach:

Key Findings:

Interpretation:

Limitations:

Conclusion:

Original Source(s)

Clinical evaluation of large language model recommendations in melanoma: comparison with multidisciplinary tumor board decisions in a real-world cohort

Related Content

Integrative oncology in colorectal cancer: evidence-based strategies from prevention through survivorship

Optimal treatment strategies for unresectable stage III EGFR-mutated non-small cell lung cancer: a systematic review and Bayesian network meta-analysis

Advances in the prevention and treatment of radiation-induced brain necrosis: a narrative review