Metagenomic Analysis of BAL Fluid Differentiates Lung Cancer and Pulmonary Infections
Overview
This study utilized metagenomic next-generation sequencing (mNGS) of bronchoalveolar lavage fluid (BALF) from 402 patients to develop a multimodal machine learning model that accurately differentiates lung cancer from pulmonary infections including bacterial, fungal, and tuberculosis. The integrated model achieved high diagnostic accuracy with AUCs of 0.937 in training and 0.847 in testing cohorts, demonstrating the potential of mNGS-based approaches for rapid, cost-effective differential diagnosis.
Background
Lung cancer and pulmonary infections often present with overlapping clinical and radiological features, complicating timely and accurate diagnosis. Traditional diagnostic methods can be slow or inconclusive, leading to misdiagnoses and inappropriate treatments. Metagenomic next-generation sequencing (mNGS) enables simultaneous detection of microbial pathogens and host genetic information, offering a promising tool for comprehensive analysis. Integrating microbial profiles with host response data may improve differentiation between malignancies and infections using minimal samples and rapid turnaround times.
Data Highlights
Group
Number of Patients (n)
Median mNGS DNA Reads (Millions)
Median mNGS RNA Reads (Millions)
Lung Cancer
123
21.9 (IQR 18.0–27.6)
19.1 (IQR 13.8–26.2)
Bacterial Infection
114
Not specified
Not specified
Fungal Infection
79
Not specified
Not specified
Pulmonary Tuberculosis
86
Not specified
Not specified
Key Findings
The integrated multimodal machine learning model (Model VI) combining microbial and host genomic features achieved an AUC of 0.937 in the training cohort and 0.847 in the test cohort for differentiating lung cancer from pulmonary infections.
Distinct microbial community structures and specific taxa were identified between lung cancer and infection groups, with significant differences in β-diversity metrics (PERMANOVA P ≤ 0.002).
The model effectively differentiated lung cancer from tuberculosis, fungal, and bacterial infections with accuracies of 0.896, 0.915, and 0.907 respectively using a rule-in/rule-out strategy.
Host-derived features including gene expression, transposable element activity, immune cell composition, and tumor fraction from copy number variation contributed to diagnostic accuracy.
Most sequencing reads (>95%) were human, highlighting the importance of integrating host transcriptomic data alongside microbial profiling.
Clinical Implications
The mNGS-based multimodal diagnostic approach offers a rapid, minimally invasive, and cost-effective tool to accurately distinguish lung cancer from pulmonary infections, potentially reducing diagnostic delays and inappropriate treatments. Incorporating both microbial and host genomic data enhances diagnostic precision, supporting more informed clinical decision-making in patients with ambiguous lung lesions. This method may streamline workflows by requiring fewer samples and providing results within 24 hours.
Conclusion
This study demonstrates that integrating microbial and host genomic features from BALF mNGS data via machine learning enables accurate differentiation of lung cancer from pulmonary infections. Such an approach holds promise for improving early diagnosis and guiding appropriate management of complex pulmonary diseases.
References
Metagenomic Analysis of Bronchoalveolar Lavage Identifies Distinct Pulmonary Disorders, 2024