Benchmarking Large Language Models and Prompt Engineering Strategies in Microsatellite Instability Cancers: Evaluation Study

By
Yuxin Zhang
Jie Song
Cheng Bi
Xin Zheng
Zhichuan Xu
Dan Cao
Bairong Shen
May 21, 2026
0 min

Journal Of Medical Internet Research (Jmir)

Overview

This study evaluates the performance of large language models (LLMs) in the context of microsatellite instability (MSI) in cancer. Using the Microsatellite Instability Cancer Benchmark (MSIC-Bench), the research highlights significant gaps in specialized knowledge among LLMs and introduces retrieval-augmented generation (RAG) as a potential solution.

Background

Microsatellite instability (MSI) is a critical biomarker in cancer that influences diagnosis, prognosis, and treatment strategies. Despite its importance, the application of artificial intelligence, particularly large language models (LLMs), in MSI-related cancer care is underexplored. Understanding how LLMs can assist in this domain could enhance personalized therapeutic approaches for MSI-positive patients.

Data Highlights

No numerical data or trial data presented in the article.

Key Findings

The MSIC-Bench framework was developed to evaluate LLMs on both foundational and frontier knowledge in MSI-related cancer.
Three LLMs (GPT-4o, Gemini 2.5 Pro, Claude Opus 4) were assessed across four prompting strategies.
Standard LLMs exhibited a significant deficit in specialized knowledge, impacting their performance.
Retrieval-augmented generation (RAG) shifted the error mode from knowledge deficits to information retrieval failures.
RAG systems can introduce 'false refusals,' which may degrade utility despite improving safety.
Integrating broad clinical guidelines with specialized knowledge in RAG architectures can enhance system robustness.

Clinical Implications

The findings suggest that while LLMs have potential in cancer care, their current limitations in specialized knowledge must be addressed. Implementing RAG architectures may improve the accuracy and reliability of LLM responses in clinical settings, ultimately aiding in the management of MSI-positive cancers.

Conclusion

This study underscores the need for further development of LLMs tailored to the complexities of MSI in cancer. By addressing knowledge gaps and optimizing retrieval strategies, LLMs can become valuable tools in personalized cancer therapy.

Benchmarking Large Language Models and Prompt Engineering Strategies in Microsatellite Instability Cancers: Evaluation Study

Clinical Report: Assessing Large Language Models in Cancers with MSI

Overview

Background

Data Highlights

Key Findings

Clinical Implications

Conclusion

Related Resources & Content

Original Source(s)

Benchmarking Large Language Models and Prompt Engineering Strategies in Microsatellite Instability Cancers: Evaluation Study

Related Content

Treatment evolution and survival impact of consolidation therapies in mantle cell lymphoma: insights from an Asia-Pacific real-world registry

Myelodysplastic syndromes complicated by atypical Sweet syndrome: a brief research report

Validation of a cervical CDO1/CELF4 methylation test for endometrial cancer: a prospective paired-sample comparison with intrauterine specimen