Teaching AI to Read Liquid Biopsies

Researchers explore whether large language models can support cfRNA biomarker discovery

June 26, 2026
2 min

The Pathologist

Objective:

To evaluate the ability of large language models (LLMs) to identify diagnostic biomarkers from cell-free RNA (cfRNA) data.

Approach:

Study Design: Researchers assessed several LLMs using published cfRNA datasets from three patient cohorts with varying diagnostic complexities.
Comparison Methodology: LLM-generated gene panels were compared to randomly selected genes and panels derived from conventional differential expression analyses.

Key Findings:

LLM-selected gene panels outperformed random selections, indicating the models can identify biologically relevant candidates.
Performance was strongest in the tuberculosis dataset, with some LLM-generated panels performing similarly to traditional methods.
Models frequently selected genes related to immune and inflammatory pathways.
LLMs showed inconsistent performance in executing a complete biomarker discovery workflow compared to established machine learning approaches.

Interpretation:

Current performance of LLMs does not replace established methods.

Limitations:

Inconsistent adherence to instructions by LLMs.
Challenges with reproducibility of results.

Conclusion:

LLM-generated biomarker signatures require rigorous validation before clinical application and should be used alongside traditional bioinformatics methods.