Detection of cancer recurrence from Thai-English electronic medical records using sentence embeddings - Takeaways - MDSpire

Detection of cancer recurrence from Thai-English electronic medical records using sentence embeddings

  • By

  • Ekapob Sangariyavanich

  • Wanchana Ponthongmak

  • Nawanan Theera-Ampornpunt

  • Nat Tangchitnob

  • Gareth J McKay

  • Ammarin Thakkinstian

  • July 2, 2026

  • 0 min

Share

  • 1

    This study developed monolingual and bilingual SBERT models for detecting cancer recurrence in Thai-English electronic medical records.

  • 2

    A dataset of 32,436 documents from 1,250 patients was used for model development, with external validation from 9,244 documents across two hospitals.

  • 3

    MetBERT achieved the highest AUPRC for locoregional versus no recurrence and locoregional versus distant recurrence in both development and validation.

  • 4

    Bilingual-SBERT demonstrated robust external validation performance, particularly for distant versus no recurrence, with AUPRC values of 17.55%–24.39%.

  • 5

    The study validates sentence embedding models for mixed Thai-English EMRs, providing a practical solution for cancer recurrence detection.

Original Source(s)

Related Content