Improving respiratory disease detection through SSL-enhanced acoustic analysis and exercise-rest measurements

By
Álvaro Vera-López
Darío Tilves-Santiago
José Manuel Ramírez-Sánchez
Laura Docío-Fernández
Carmen García-Mateo
María Bustillo-Casado
Alejandro García-Caballero
June 24, 2026
0 min

Frontiers In Medicine

Objective:

To evaluate a generalized screening model integrating stress-induced acoustic analysis with machine learning for improved detection of respiratory disorders, particularly in the context of Post-Acute Sequelae of SARS-CoV-2 (PASC).

Approach:

Dataset: Utilized the DICOPERIA-Voice dataset (n = 154) for recordings of sustained vowel phonation (/a/) and voluntary coughing at resting state and after a physiological stress protocol.
Feature Extraction: Employed a dual-feature extraction strategy combining traditional acoustic biomarkers with high-dimensional Self-Supervised Learning (SSL) embeddings from wav2vec 2.0, WavLM, and HuBERT.
Classification: Performed binary classification (PASC vs. Healthy) using Logistic Regression, evaluated via stratified 5-fold cross-validation.

Key Findings:

Physical exertion significantly improved classification performance and reduced model variability across all tasks.
Fusion of acoustic features with WavLM and wav2vec 2.0 achieved peak F1-scores of 82.2% for vowel phonation and 80.8% for coughing in post-exercise conditions.
A cross-task late fusion model aggregation reached the highest overall performance with an F1-score of 87.7%.

Interpretation:

Incorporating Self-Supervised Learning representations into acoustic analysis improves the sensitivity of voice-based screening, while post-exercise measurements enhance robustness and consistency.

Limitations:

The study relies on a specific dataset which may limit generalizability.
Further validation is needed before integration into routine clinical assessments.