Clinical Report: SpeechCARE Multimodal Pipeline for Cognitive Impairment Detection
Overview
SpeechCARE is a novel multimodal transformer-based pipeline that detects cognitive impairment from brief speech recordings, classifying Alzheimer’s Disease, Mild Cognitive Impairment, and healthy controls. Trained on a multilingual dataset, it achieved a 72.11% F1-score and demonstrated strong generalizability across languages and speech tasks.
Background
Alzheimer’s disease and related dementias affect a significant portion of adults over 60, with many cases remaining undiagnosed due to subtle symptoms and limited biomarker availability. Speech contains acoustic and linguistic markers indicative of cognitive decline, but prior models have struggled with performance and generalizability. Transformer models, which capture long-range dependencies in speech and language, offer promise but have been underutilized in this domain due to data scarcity and complexity. The National Institute on Aging’s PREPARE challenge provided a diverse multilingual dataset to advance early detection tools.
Data Highlights
Metric
Value (Mean ± SD)
Micro AUC
86.83% ± 0.46%
Weighted AUC
80.67% ± 0.65%
F1-Score
72.11% ± 0.44%
Micro Precision (AP)
74.73% ± 1.21%
Weighted Precision (AP)
73.50% ± 0.66%
Key Findings
SpeechCARE integrates acoustic (mHuBERT) and linguistic (mGTE) embeddings with demographic data using an Adaptive Gating Fusion mechanism.
Incorporating age as a demographic feature significantly improved predictive accuracy.
The model achieved a balanced micro AUC of 86.83% and an F1-score of 72.11% on a held-out multilingual test set (English, Spanish, Mandarin).
Threshold optimization enhanced recall for Mild Cognitive Impairment detection.
Fairness analysis revealed moderate disparities, particularly among Spanish speakers, indicating areas for further improvement.
SpeechCARE complements blood-based biomarkers by capturing functional speech deficits, supporting scalable early detection.
Clinical Implications
SpeechCARE offers a scalable, non-invasive screening tool for early cognitive impairment detection across diverse languages and speech contexts, potentially aiding clinicians in identifying at-risk individuals earlier. Its integration of speech and demographic data enhances accuracy, but attention to language-specific fairness is needed to ensure equitable application. This approach may supplement existing biomarker methods, facilitating broader screening in clinical and community settings.
Conclusion
SpeechCARE demonstrates that a multimodal transformer-based approach can effectively detect cognitive impairment from brief speech samples with strong multilingual generalizability. Its promising performance supports further development and clinical integration for early, accessible cognitive screening.
References
National Institute on Aging PREPARE Challenge 2023-- Multilingual Speech Dataset for Cognitive Impairment Detection
mHuBERT and mGTE Transformer Models for Acoustic and Linguistic Embeddings