CancerLLM: a large language model in cancer domain - Scorecard - MDSpire

CancerLLM: a large language model in cancer domain

By
Mingchen Li
Zaifu Zhan
Jiatan Huang
Jeremy Yeung
Kai Ding
Anne Blaes
Steven Johnson
Hongfang Liu
Hua Xu
Rui Zhang
February 20, 2026
0 min

Npj Digital Medicine

Share

Clinical Scorecard: CancerLLM: A Specialized Language Model for Oncology Applications

At a Glance

Category	Detail
Condition	Cancer phenotyping and diagnosis
Key Mechanisms	7-billion-parameter Mistral-style language model trained on clinical notes and pathology reports, fine-tuned for cancer phenotype extraction and diagnosis generation
Target Population	Patients with 17 different cancer types
Care Setting	Clinical research and healthcare settings requiring cancer diagnosis and phenotyping support

Key Highlights

CancerLLM achieved an F1 score of 91.78% on cancer phenotype extraction and 86.81% on diagnosis generation.
Outperformed existing large language models by an average F1 score improvement of 9.23%.
Demonstrated efficiency in computational resources (time and GPU usage) and robustness compared to other LLMs.

Guideline-Based Recommendations

Diagnosis

Utilize CancerLLM for automated extraction of cancer phenotypes from clinical notes and pathology reports.
Apply CancerLLM-generated diagnosis suggestions to support clinical decision-making in oncology.

Management

Incorporate CancerLLM outputs to enhance accuracy and efficiency in cancer diagnosis workflows.

Monitoring & Follow-up

Monitor model performance on internal benchmarks to ensure continued accuracy and robustness.

Risks

Consider computational resource requirements despite efficiency improvements.
Validate model outputs clinically to avoid overreliance on automated diagnosis.

Patient & Prescribing Data

Patients across 17 cancer types represented in clinical notes and pathology reports used for model training.

CancerLLM supports phenotype extraction and diagnosis generation to inform personalized oncology treatment decisions.

Clinical Best Practices

Use CancerLLM as a decision support tool rather than a standalone diagnostic system.
Combine CancerLLM outputs with clinical expertise and additional diagnostic data.
Leverage publicly available code and synthetic datasets for replication and extension of CancerLLM in clinical research.

References

Original Source(s)

Npj Digital Medicine

CancerLLM: a large language model in cancer domain

by Mingchen Li, Zaifu Zhan, Jiatan Huang, Jeremy Yeung, Kai Ding, Anne Blaes, Steven Johnson, Hongfang Liu, Hua Xu, Rui Zhang
February 20, 2026

Related Content

Frontiers In Medicine

Peutz–Jeghers syndrome with concurrent lobular endocervical glandular hyperplasia and sex cord tumor with annular tubules: a case report

by Min Yin, Chunli Lu, Lei Cheng
May 1, 2026

Frontiers In Immunology

MesenSistem-EB: systemic haploidentical mesenchymal stem cell therapy in recessive dystrophic epidermolysis bullosa associated with clinical benefits and correlated with MCP1 and sCD40L dynamics

Frontiers In Oncology

Case Report: Clinicopathological features and outcomes of superficial cervicovaginal myofibroblastoma: analysis of two cases and a review of the literature

by Xiaoli Cai, Yanli Liu, Yunfeng Niu, Xiaodan Shen, Qian Zhang, Lei Liang, Shuang Liu
May 13, 2026