Large language models for breast cancer treatment planning: a blinded real-world evaluation of DeepSeek, ChatGPT, and oncologist recommendations

By
Ming Li
Yiran Yu
Gang Li
Xiaoli Zhang
Yuting Shi
Rila Su
June 30, 2026
0 min

Frontiers In Digital Health

Overview

This study evaluates the accuracy and concordance of two large language models, DeepSeek V3.1 and ChatGPT-5, against oncologist recommendations in breast cancer treatment planning.

Background

Breast cancer remains the most prevalent cancer among women, necessitating effective treatment planning. The integration of large language models (LLMs) in oncology decision support is gaining attention, yet their real-world applicability and alignment with clinical practice require thorough investigation. This study addresses the performance of LLMs in generating treatment recommendations for breast cancer, particularly in complex cases.

Data Highlights

Model	Accuracy Score (Mean ± SD)	Internal Variance	Clinician Agreement
DeepSeek V3.1	4.91 ± 0.36	Minimal	74.2%
ChatGPT-5	4.65 ± 0.62	Higher	Declined with stage
Clinicians	3.82 ± 0.63	Higher	Varied by stage

Key Findings

DeepSeek V3.1 achieved the highest expert-rated accuracy scores (4.91 ± 0.36).
ChatGPT-5 scored lower than DeepSeek V3.1 (4.65 ± 0.62).
Clinician recommendations had the lowest accuracy score (3.82 ± 0.63).
AI outputs showed high mutual consistency at 74.2%.
AI-clinician agreement decreased significantly with advanced disease stages (P < 0.001).
In Stage IV cases, clinicians prioritized real-world constraints such as financial toxicity.

Clinical Implications

The findings highlight limitations in addressing complex clinical contexts and socioeconomic factors.

Conclusion

Advanced LLMs demonstrate strong performance in generating standardized breast cancer treatment plans.

Large language models for breast cancer treatment planning: a blinded real-world evaluation of DeepSeek, ChatGPT, and oncologist recommendations

Clinical Report: Evaluation of Large Language Models in Breast Cancer Treatment Planning

Overview

Background

Data Highlights

Key Findings

Clinical Implications

Conclusion

Related Resources & Content

Original Source(s)

Large language models for breast cancer treatment planning: a blinded real-world evaluation of DeepSeek, ChatGPT, and oncologist recommendations

Related Content

Homologous recombination pathway alterations in basal-like breast cancer

The role of LARP1 in breast cancer progression: from prognosis to immune microenvironment remodeling

Management challenges in metachronous bilateral breast cancer with discordant receptor status: a case report