Benchmarking Large Language Models and Prompt Engineering Strategies in Microsatellite Instability Cancers: Evaluation Study - Report - MDSpire

Benchmarking Large Language Models and Prompt Engineering Strategies in Microsatellite Instability Cancers: Evaluation Study

  • By

  • Yuxin Zhang

  • Jie Song

  • Cheng Bi

  • Xin Zheng

  • Zhichuan Xu

  • Dan Cao

  • Bairong Shen

  • May 21, 2026

  • 0 min

Share

Clinical Report: Assessing Large Language Models in Cancers with MSI

Overview

This study evaluates the performance of large language models (LLMs) in the context of microsatellite instability (MSI) in cancer. Using the Microsatellite Instability Cancer Benchmark (MSIC-Bench), the research highlights significant gaps in specialized knowledge among LLMs and introduces retrieval-augmented generation (RAG) as a potential solution.

Background

Microsatellite instability (MSI) is a critical biomarker in cancer that influences diagnosis, prognosis, and treatment strategies. Despite its importance, the application of artificial intelligence, particularly large language models (LLMs), in MSI-related cancer care is underexplored. Understanding how LLMs can assist in this domain could enhance personalized therapeutic approaches for MSI-positive patients.

Data Highlights

No numerical data or trial data presented in the article.

Key Findings

  • The MSIC-Bench framework was developed to evaluate LLMs on both foundational and frontier knowledge in MSI-related cancer.
  • Three LLMs (GPT-4o, Gemini 2.5 Pro, Claude Opus 4) were assessed across four prompting strategies.
  • Standard LLMs exhibited a significant deficit in specialized knowledge, impacting their performance.
  • Retrieval-augmented generation (RAG) shifted the error mode from knowledge deficits to information retrieval failures.
  • RAG systems can introduce 'false refusals,' which may degrade utility despite improving safety.
  • Integrating broad clinical guidelines with specialized knowledge in RAG architectures can enhance system robustness.

Clinical Implications

The findings suggest that while LLMs have potential in cancer care, their current limitations in specialized knowledge must be addressed. Implementing RAG architectures may improve the accuracy and reliability of LLM responses in clinical settings, ultimately aiding in the management of MSI-positive cancers.

Conclusion

This study underscores the need for further development of LLMs tailored to the complexities of MSI in cancer. By addressing knowledge gaps and optimizing retrieval strategies, LLMs can become valuable tools in personalized cancer therapy.

Related Resources & Content

  1. Sidhom JW, ASCO Post, 2025 -- LLM Trained on Somatic Mutations Shows Prognostic and Predictive Utility
  2. Nature, npj Digital Medicine, 2026 -- CancerLLM: a large language model in cancer domain
  3. ASCO Post, 2026 -- Large Language Models May Generate Concise, Coherent Pathology Summaries, Reducing Physician Burden
  4. Journal of Gastroenterology, 2019 -- Microsatellite Instability and Immune Checkpoint Inhibitors: Advancing Precision Medicine for Gastrointestinal and Hepatobiliary Malignancies
  5. NCI, Genetics of Colorectal Cancer (PDQ®) -- https://www.cancer.gov/types/colorectal/hp/colorectal-genetics-pdq
  6. FDA, 2024 -- FDA approves pembrolizumab with chemotherapy for primary advanced or recurrent endometrial carcinoma
  7. FDA, 2024 -- FDA approves pembrolizumab for first-line treatment of MSI-H/dMMR colorectal cancer
  8. Genetics of Colorectal Cancer (PDQ®) - NCI
  9. FDA approves pembrolizumab with chemotherapy for primary advanced or recurrent endometrial carcinoma | FDA
  10. FDA approves pembrolizumab for first-line treatment of MSI-H/dMMR colorectal cancer | FDA

Original Source(s)

Related Content