Evaluating the accuracy and communication quality of large language models in Ewing sarcoma: a comparative analysis of ChatGPT, Claude, Gemini, DeepSeek, and Grok - Report - MDSpire

Evaluating the accuracy and communication quality of large language models in Ewing sarcoma: a comparative analysis of ChatGPT, Claude, Gemini, DeepSeek, and Grok

  • By

  • Cihan Ünyılmaz

  • June 30, 2026

  • 0 min

Share

Clinical Report: Assessing the Precision and Communication Effectiveness of LLMs

Overview

This study evaluates the performance of five large language models (LLMs) in providing information about Ewing sarcoma.

Background

Ewing sarcoma is a rare and aggressive pediatric cancer that requires complex management involving multidisciplinary teams. Accurate communication is critical, as families seek reliable information about diagnosis, treatment options, and prognosis. The use of LLMs for medical information necessitates assessment of their effectiveness in delivering quality education.

Data Highlights

ModelOverall PerformanceTechnical AccuracyCommunication Quality
ChatGPTHighestModerateBest
ClaudeSecondModerateGood
DeepSeekThirdHighestLower
GeminiLowerLowLow
GrokLowestLowLow

Key Findings

  • ChatGPT achieved the highest overall performance among the LLMs evaluated.
  • DeepSeek demonstrated the greatest technical accuracy but lower communication quality.
  • Gemini and Grok produced more superficial responses with lower overall scores.
  • Significant differences in performance were observed among the five LLMs (p < 0.001).

Clinical Implications

Current LLMs can support patient education but should not replace specialist consultation.

Conclusion

This study emphasizes the need for careful evaluation and validation of LLMs before their routine use in clinical practice.

Related Resources & Content

  1. DIGITAL HEALTH, Factors shaping the adoption of large language models among hospital administrative staff: A cross-sectional survey study, 2026
  2. Journal of Medical Internet Research (JMIR), Performance Evaluation of GPT-5, Grok 4, and DeepSeek R1 in Interpreting Complete Blood Count Reports for Hematologic Diseases: Retrospective Comparative Study, 2026
  3. Eye, Performance of large language models for ophthalmic literature retrieval, 2026
  4. Frontiers in Medicine, Utility of large language models as information tools for nursing care in gout: a comparative study of DeepSeek and ChatGPT, 2026
  5. Bone Cancer, Version 2.2025, NCCN Clinical Practice Guidelines In Oncology - PubMed, 2025
  6. Randomized Controlled Trial of Interval-Compressed Chemotherapy for the Treatment of Localized Ewing Sarcoma: A Report From the Children's Oncology Group - PMC, 2023
  7. Bone Cancer, Version 2.2025, NCCN Clinical Practice Guidelines In Oncology - PubMed
  8. Randomized Controlled Trial of Interval-Compressed Chemotherapy for the Treatment of Localized Ewing Sarcoma: A Report From the Children's Oncology Group - PMC
  9. Phase III assessment of topotecan and cyclophosphamide and high-dose ifosfamide in rEECur: An international randomized controlled trial of chemotherapy for the treatment of recurrent and primary refractory Ewing sarcoma (RR-ES). | Journal of Clinical Oncology

Original Source(s)

Related Content