Leveraging large language models to extract smoking history from clinical notes for lung cancer surveillance - Summary - MDSpire

Leveraging large language models to extract smoking history from clinical notes for lung cancer surveillance

  • By

  • Ingrid Luo

  • Anna Graber-Naidich

  • Mengrui Zhang

  • Rakshit Kaushik

  • Grant M. Nieda

  • Tony Chen

  • Bo Gu

  • Eunji Choi

  • Victoria Y. Ding

  • Fatma Gunturkun

  • Mina Satoyoshi

  • Archana Bhat

  • Tae Yoon Lee

  • Chloe C. Su

  • Timothy John Ellis-Caleo

  • A. Solomon Henry

  • Manisha Desai

  • Leah M. Backhus

  • Natalie S. Lui

  • Ann Leung

  • Joel W. Neal

  • Allison W. Kurian

  • Curtis P. Langlotz

  • Heather A. Wakelee

  • Su-Ying Liang

  • Aparajita Khan

  • Summer S. Han

  • November 28, 2025

  • 0 min

Share

Objective:

To enhance the quality of smoking history documentation in electronic health records (EHRs) using large language models (LLMs) for improved lung cancer monitoring, focusing on accuracy and completeness.

Key Findings:
  • Generative LLMs achieved > 96% accuracy across seven key smoking-related variables, including smoking status and history.
  • External validation showed robust generalizability with 97.5–98.8% accuracy across diverse patient populations.
  • Risk model-based surveillance incorporating smoking factors outperformed NCCN Guidelines in identifying second malignancies.
Interpretation:

The study demonstrates that generative LLMs can significantly improve the accuracy and completeness of smoking history documentation, which is critical for lung cancer surveillance and patient monitoring.

Limitations:
  • The study may be limited by the specific healthcare systems involved, which could affect the generalizability of the findings to other settings.
  • Potential LLM hallucinations were not systematically addressed in longitudinal contexts, raising concerns about reliability.
Conclusion:

Generative LLMs represent a promising advancement in extracting and harmonizing smoking histories from clinical documentation, which is crucial for enhancing lung cancer monitoring and improving patient outcomes.

Original Source(s)

Related Content