A device-invariant multi-modal learning framework for respiratory disease classification - Report - MDSpire

A device-invariant multi-modal learning framework for respiratory disease classification

  • By

  • Mo Yang

  • Xuefei Liu

  • Wei Du

  • Yang Liu

  • Wenyu Zhu

  • Zhaoyang Bu

  • Jiaxuan Mao

  • Qian Wang

  • Si Chen

  • Min Zhou

  • Jie-ming Qu

  • February 26, 2026

  • 0 min

Share

Multi-Modal Deep Learning for Device-Invariant Respiratory Disease Classification

Overview

A novel multimodal deep learning framework was developed to classify multiple adult respiratory diseases using cough sounds, demographics, and symptoms, overcoming device heterogeneity. Evaluated on a large multi-center dataset of over 10,000 cases, the model achieved high AUROC scores for COPD, LRTI, and pulmonary shadows, demonstrating robust cross-device generalization.

Background

Respiratory diseases are a leading cause of morbidity worldwide, with conditions like chronic obstructive pulmonary disease (COPD) requiring early and accurate diagnosis for effective management. Traditional diagnostic methods often rely on clinical visits and specialized equipment, limiting accessibility. Advances in cough sound analysis using deep learning have enabled smartphone-based screening, but device variability and population diversity pose challenges. Integrating multimodal data and enforcing device-invariant feature learning can enhance diagnostic accuracy and scalability in real-world settings.

Data Highlights

Respiratory ConditionAUROC
Chronic Obstructive Pulmonary Disease (COPD)0.9698
Lower Respiratory Tract Infection (LRTI)0.8483
Pulmonary Shadows (PS)0.8720
Overall Comorbidity Identification (7 diseases)0.8907

Key Findings

  • The proposed multimodal framework jointly models cough acoustics, demographic data, and symptom descriptions for multi-label respiratory disease classification.
  • An adversarial branch in the audio encoder enforces device-invariant feature learning, mitigating device effect on model performance.
  • Invariant risk minimization-augmented loss further enhances robustness to non-structural shifts across devices.
  • Evaluated on a real-world, multi-center dataset of over 10,000 cases spanning seven respiratory diseases, the model achieved AUROC scores of 0.9698 for COPD, 0.8483 for LRTI, and 0.8720 for pulmonary shadows.
  • The method demonstrated promising performance in identifying comorbidities with an overall AUROC of 0.8907.
  • Extensive experiments confirmed improved cross-device generalization for cough-based respiratory disease diagnosis.

Clinical Implications

This scalable AI-based approach enables accurate respiratory disease screening using smartphone-recorded cough sounds, supporting self-management and early detection in diverse populations. The device-invariant design ensures consistent performance across different recording devices, facilitating broader clinical adoption. Incorporating multimodal data enhances diagnostic precision, potentially reducing reliance on specialized pulmonary function tests.

Conclusion

The study presents a robust, multimodal deep learning framework that effectively classifies multiple respiratory diseases across heterogeneous devices, advancing the clinical applicability of cough-based screening tools. This approach holds promise for scalable, accessible respiratory disease diagnosis in real-world settings.

References

  1. Wang et al. 2025 -- Global burden of chronic obstructive pulmonary disease and its attributable risk factors
  2. Bhakta et al. 2023 -- Standardisation of lung volume measurement: European Respiratory Society/ATS update
  3. Thawanaphong & Nair 2025 -- Contemporary concise review: chronic obstructive pulmonary disease
  4. Agusti & Vogelmeier 2023 -- GOLD 2024: key changes overview
  5. Kim & Han 2025 -- Challenges and future of pulmonary function testing in COPD
  6. Chu et al. 2025 -- Cycleguardian: automatic respiratory sound classification framework
  7. Isangula & Haule 2024 -- AI-based cough audio classifier for respiratory diseases in Tanzania
  8. Sharan & Xiong 2025 -- Wet and dry cough classification using machine learning
  9. Huddart et al. 2024 -- Dataset of solicited cough sound for tuberculosis triage testing

Original Source(s)

Related Content