Large language models for breast cancer treatment planning: a blinded real-world evaluation of DeepSeek, ChatGPT, and oncologist recommendations - Takeaways - MDSpire

Large language models for breast cancer treatment planning: a blinded real-world evaluation of DeepSeek, ChatGPT, and oncologist recommendations

  • By

  • Ming Li

  • Yiran Yu

  • Gang Li

  • Xiaoli Zhang

  • Yuting Shi

  • Rila Su

  • June 30, 2026

  • 0 min

Share

  • 1

    The study evaluated the performance of DeepSeek V3.1 and ChatGPT-5 in generating breast cancer treatment plans compared to experienced oncologists.

  • 2

    DeepSeek V3.1 achieved the highest expert-rated accuracy scores (4.91 ± 0.36), outperforming ChatGPT-5 and oncologists.

  • 3

    AI-generated recommendations showed higher guideline concordance and lower variability than historical oncologist plans.

  • 4

    AI-clinician agreement decreased significantly with advanced disease stages, particularly in Stage IV cases.

  • 5

    The findings highlight the potential of LLMs as decision-support tools while emphasizing the need for human judgment in individualized care.

Original Source(s)

Related Content