SpeechCARE: dynamic multimodal modeling for cognitive screening in diverse linguistic and speech task contexts

By
Hossein Azadmaleki
Yasaman Haghbin
Sina Rashidi
Mohammad Javad Momeni Nezhad
Ali Zolnour
Maryam Zolnoori
November 17, 2025
0 min

Npj Digital Medicine

Overview

SpeechCARE is a novel multimodal transformer-based pipeline that detects cognitive impairment from brief speech recordings, classifying Alzheimer’s Disease, Mild Cognitive Impairment, and healthy controls. Trained on a multilingual dataset, it achieved a 72.11% F1-score and demonstrated strong generalizability across languages and speech tasks.

Background

Alzheimer’s disease and related dementias affect a significant portion of adults over 60, with many cases remaining undiagnosed due to subtle symptoms and limited biomarker availability. Speech contains acoustic and linguistic markers indicative of cognitive decline, but prior models have struggled with performance and generalizability. Transformer models, which capture long-range dependencies in speech and language, offer promise but have been underutilized in this domain due to data scarcity and complexity. The National Institute on Aging’s PREPARE challenge provided a diverse multilingual dataset to advance early detection tools.

Data Highlights

Metric	Value (Mean ± SD)
Micro AUC	86.83% ± 0.46%
Weighted AUC	80.67% ± 0.65%
F1-Score	72.11% ± 0.44%
Micro Precision (AP)	74.73% ± 1.21%
Weighted Precision (AP)	73.50% ± 0.66%

Key Findings

SpeechCARE integrates acoustic (mHuBERT) and linguistic (mGTE) embeddings with demographic data using an Adaptive Gating Fusion mechanism.
Incorporating age as a demographic feature significantly improved predictive accuracy.
The model achieved a balanced micro AUC of 86.83% and an F1-score of 72.11% on a held-out multilingual test set (English, Spanish, Mandarin).
Threshold optimization enhanced recall for Mild Cognitive Impairment detection.
Fairness analysis revealed moderate disparities, particularly among Spanish speakers, indicating areas for further improvement.
SpeechCARE complements blood-based biomarkers by capturing functional speech deficits, supporting scalable early detection.

Clinical Implications

SpeechCARE offers a scalable, non-invasive screening tool for early cognitive impairment detection across diverse languages and speech contexts, potentially aiding clinicians in identifying at-risk individuals earlier. Its integration of speech and demographic data enhances accuracy, but attention to language-specific fairness is needed to ensure equitable application. This approach may supplement existing biomarker methods, facilitating broader screening in clinical and community settings.

Conclusion

SpeechCARE demonstrates that a multimodal transformer-based approach can effectively detect cognitive impairment from brief speech samples with strong multilingual generalizability. Its promising performance supports further development and clinical integration for early, accessible cognitive screening.

References

National Institute on Aging PREPARE Challenge 2023-- Multilingual Speech Dataset for Cognitive Impairment Detection
mHuBERT and mGTE Transformer Models for Acoustic and Linguistic Embeddings

SpeechCARE: dynamic multimodal modeling for cognitive screening in diverse linguistic and speech task contexts

Clinical Report: SpeechCARE Multimodal Pipeline for Cognitive Impairment Detection

Overview

Background

Data Highlights

Key Findings

Clinical Implications

Conclusion

References

Original Source(s)

SpeechCARE: dynamic multimodal modeling for cognitive screening in diverse linguistic and speech task contexts

Related Content

Short-Term and Long-Term Safety Analyses of Brexpiprazole for Agitation Associated with Dementia due to Alzheimer’s Disease: Timing and Duration of Adverse Events

Recognizing Housing Insecurity as a Critical Social Determinant of Healthy Aging

Makary’s departure and Cassidy’s tenuous Senate seat