Clinical evaluation of large language model recommendations in melanoma: comparison with multidisciplinary tumor board decisions in a real-world cohort

By
Belma Babic
Sefika Umihanic
Hedim Osmanovic
Nejra Selak
Erna Sehic-Kozica
Lejla Moranjkic
Inga Marijanovic
Marija Karaga
Amina Jalovcic Suljevic
Sekib Umihanic
Fadil Umihanic
Arzumana Ozegovic-Orucevic
June 29, 2026
0 min

Frontiers In Oncology

Overview

This study evaluates the performance of four large language models (LLMs) in generating treatment recommendations for melanoma compared to a multidisciplinary tumor board's decisions.

Background

Malignant melanoma is a significant global health challenge, with rising incidence rates and a need for effective treatment strategies. Multidisciplinary tumor boards (MDTs) play a crucial role in decision-making for melanoma management, particularly in resource-limited settings. The integration of large language models (LLMs) into this process requires thorough evaluation.

Data Highlights

LLM	Performance Rating
ChatGPT-5 Thinking	Strongest
ChatGPT-4o	Moderate
Gemini 2.5 Pro	Less Favorable
DeepSeek-V3.2	Least Favorable

Key Findings

Inter-rater reliability among oncologists was acceptable to good.
ChatGPT-5 Thinking showed consistent performance across evaluated domains.
Statistically significant differences were observed between the LLMs in all domains assessed.
Performance differences were most relevant in complex treatment scenarios.
LLM-generated recommendations should not replace independent treatment decisions.

Clinical Implications

The findings indicate that LLMs may have a role in melanoma treatment decision-making, but their recommendations should be used as supportive tools rather than as standalone treatment decisions.

Conclusion

This study emphasizes the need for further research before LLMs can be integrated into clinical workflows.

Clinical evaluation of large language model recommendations in melanoma: comparison with multidisciplinary tumor board decisions in a real-world cohort

Clinical Report: Assessment of Large Language Model Suggestions in Melanoma

Overview

Background

Data Highlights

Key Findings

Clinical Implications

Conclusion

Related Resources & Content

Original Source(s)

Clinical evaluation of large language model recommendations in melanoma: comparison with multidisciplinary tumor board decisions in a real-world cohort

Related Content

Unmasking biomarkers in small cell lung cancer: implication for precision oncology

Nanomedicine delivery systems remodel the immunosuppressive microenvironment of colorectal cancer: synergistic strategies and mechanisms of targeted immune checkpoint inhibitors

High FCRL5 expression predicts poor treatment response and survival in newly diagnosed multiple myeloma: a retrospective study