Benchmarking Large Language Models and Prompt Engineering Strategies in Microsatellite Instability Cancers: Evaluation Study

By
Yuxin Zhang
Jie Song
Cheng Bi
Xin Zheng
Zhichuan Xu
Dan Cao
Bairong Shen
May 21, 2026
0 min

Journal Of Medical Internet Research (Jmir)

Objective:

To develop and validate MSIC-Bench, a novel benchmark specifically designed for evaluating large language models (LLMs) in the context of microsatellite instability (MSI) cancer care, and to systematically assess the capabilities and limitations of state-of-the-art LLMs.

Key Findings:

Standard LLMs exhibit a significant deficit in specialized knowledge.
RAG shifts the bottleneck from knowledge to information retrieval, introducing 'retrieval failure' as a new dominant error mode.
RAG systems can transform high-risk fabrications into safer refusals but may also introduce 'false refusals' (incorrect denials of information), which degrade utility.
Integrating broad clinical guidelines with specialized knowledge in RAG architectures offers a practical solution for improving LLM performance.

Interpretation:

The study highlights the current capabilities and limitations of LLMs in oncology, providing a roadmap for their future development and safe clinical integration, with implications for improving patient care.

Limitations:

The study primarily focuses on a limited number of LLMs (three) and prompting strategies (four).
The evaluation may not encompass all potential clinical scenarios or MSI-related complexities.

Conclusion:

The findings provide actionable insights for developing more robust LLM systems in MSI cancer care.

Benchmarking Large Language Models and Prompt Engineering Strategies in Microsatellite Instability Cancers: Evaluation Study

Objective:

Key Findings:

Interpretation:

Limitations:

Conclusion:

Original Source(s)

Benchmarking Large Language Models and Prompt Engineering Strategies in Microsatellite Instability Cancers: Evaluation Study

Related Content

Lifestyle and Surveillance Adherence After Childhood Cancer—When Financial Hardship Shapes Survivorship

Immune biomarker landscape and fusion partner–phenotype associations in thoracic and head-and-neck NUT carcinoma

Author Correction: The efficacy of chemotherapy is limited by intratumoral senescent cells expressing PD-L2