Detection of cancer recurrence from Thai-English electronic medical records using sentence embeddings - Scorecard - MDSpire

Detection of cancer recurrence from Thai-English electronic medical records using sentence embeddings

  • By

  • Ekapob Sangariyavanich

  • Wanchana Ponthongmak

  • Nawanan Theera-Ampornpunt

  • Nat Tangchitnob

  • Gareth J McKay

  • Ammarin Thakkinstian

  • July 2, 2026

  • 0 min

Share

Clinical Scorecard: Identification of Cancer Recurrence through Thai-English Electronic Medical Records Utilizing Sentence Embeddings

At a Glance

CategoryDetail
ConditionCancer Recurrence Detection
Key MechanismsSentence-bidirectional encoder representations from transformers (SBERT) models
Target PopulationPatients with breast, colorectal, cervical, and head and neck cancers
Care SettingMulticentre oncology hospitals in Thailand

Key Highlights

  • Developed and validated SBERT models for cancer recurrence detection in Thai-English EMRs
  • MetBERT achieved highest AUPRC for locoregional versus no recurrence and locoregional versus distant recurrence
  • Bilingual-SBERT demonstrated robust performance during external validation
  • Low AUPRC values indicate extreme class imbalance in recurrence prevalence
  • Models suitable for clinical integration as a screening tool for cancer registry workflows

Guideline-Based Recommendations

Diagnosis

  • Utilize SBERT models for detecting cancer recurrence in EMRs

Management

  • Implement bilingual-SBERT as a screening tool for prioritizing high-probability records

Monitoring & Follow-up

  • Regularly validate model performance with external datasets

Risks

  • Consider the impact of class imbalance on model performance

Patient & Prescribing Data

Patients with breast, colorectal, cervical, and head and neck cancers

Models can streamline the identification of recurrence, reducing manual workload for registrars

Clinical Best Practices

  • Integrate sentence embedding frameworks into clinical workflows
  • Ensure continuous external validation of models in diverse clinical settings
  • Address data completeness and accuracy challenges in EMRs

Related Resources & Content

Original Source(s)

Related Content