Improving respiratory disease detection through SSL-enhanced acoustic analysis and exercise-rest measurements - Summary - MDSpire

Improving respiratory disease detection through SSL-enhanced acoustic analysis and exercise-rest measurements

  • By

  • Álvaro Vera-López

  • Darío Tilves-Santiago

  • José Manuel Ramírez-Sánchez

  • Laura Docío-Fernández

  • Carmen García-Mateo

  • María Bustillo-Casado

  • Alejandro García-Caballero

  • June 24, 2026

  • 0 min

Share

Objective:

To evaluate a generalized screening model integrating stress-induced acoustic analysis with machine learning for improved detection of respiratory disorders, particularly in the context of Post-Acute Sequelae of SARS-CoV-2 (PASC).

Approach:
  • Dataset: Utilized the DICOPERIA-Voice dataset (n = 154) for recordings of sustained vowel phonation (/a/) and voluntary coughing at resting state and after a physiological stress protocol.
  • Feature Extraction: Employed a dual-feature extraction strategy combining traditional acoustic biomarkers with high-dimensional Self-Supervised Learning (SSL) embeddings from wav2vec 2.0, WavLM, and HuBERT.
  • Classification: Performed binary classification (PASC vs. Healthy) using Logistic Regression, evaluated via stratified 5-fold cross-validation.
Key Findings:
  • Physical exertion significantly improved classification performance and reduced model variability across all tasks.
  • Fusion of acoustic features with WavLM and wav2vec 2.0 achieved peak F1-scores of 82.2% for vowel phonation and 80.8% for coughing in post-exercise conditions.
  • A cross-task late fusion model aggregation reached the highest overall performance with an F1-score of 87.7%.
Interpretation:

Incorporating Self-Supervised Learning representations into acoustic analysis improves the sensitivity of voice-based screening, while post-exercise measurements enhance robustness and consistency.

Limitations:
  • The study relies on a specific dataset which may limit generalizability.
  • Further validation is needed before integration into routine clinical assessments.
Conclusion:

The proposed framework offers a scalable and objective method for detecting respiratory and vocal sequelae in chronic or post-viral conditions.

Original Source(s)

Related Content