CancerLLM: a large language model in cancer domain - Scorecard - MDSpire

CancerLLM: a large language model in cancer domain

  • By

  • Mingchen Li

  • Zaifu Zhan

  • Jiatan Huang

  • Jeremy Yeung

  • Kai Ding

  • Anne Blaes

  • Steven Johnson

  • Hongfang Liu

  • Hua Xu

  • Rui Zhang

  • February 20, 2026

  • 0 min

Share

Clinical Scorecard: CancerLLM: A Specialized Language Model for Oncology Applications

At a Glance

CategoryDetail
ConditionCancer phenotyping and diagnosis
Key Mechanisms7-billion-parameter Mistral-style language model trained on clinical notes and pathology reports, fine-tuned for cancer phenotype extraction and diagnosis generation
Target PopulationPatients with 17 different cancer types
Care SettingClinical research and healthcare settings requiring cancer diagnosis and phenotyping support

Key Highlights

  • CancerLLM achieved an F1 score of 91.78% on cancer phenotype extraction and 86.81% on diagnosis generation.
  • Outperformed existing large language models by an average F1 score improvement of 9.23%.
  • Demonstrated efficiency in computational resources (time and GPU usage) and robustness compared to other LLMs.

Guideline-Based Recommendations

Diagnosis

  • Utilize CancerLLM for automated extraction of cancer phenotypes from clinical notes and pathology reports.
  • Apply CancerLLM-generated diagnosis suggestions to support clinical decision-making in oncology.

Management

  • Incorporate CancerLLM outputs to enhance accuracy and efficiency in cancer diagnosis workflows.

Monitoring & Follow-up

  • Monitor model performance on internal benchmarks to ensure continued accuracy and robustness.

Risks

  • Consider computational resource requirements despite efficiency improvements.
  • Validate model outputs clinically to avoid overreliance on automated diagnosis.

Patient & Prescribing Data

Patients across 17 cancer types represented in clinical notes and pathology reports used for model training.

CancerLLM supports phenotype extraction and diagnosis generation to inform personalized oncology treatment decisions.

Clinical Best Practices

  • Use CancerLLM as a decision support tool rather than a standalone diagnostic system.
  • Combine CancerLLM outputs with clinical expertise and additional diagnostic data.
  • Leverage publicly available code and synthetic datasets for replication and extension of CancerLLM in clinical research.

References

Original Source(s)

Related Content