SpeechCARE: dynamic multimodal modeling for cognitive screening in diverse linguistic and speech task contexts - Report - MDSpire

SpeechCARE: dynamic multimodal modeling for cognitive screening in diverse linguistic and speech task contexts

  • By

  • Hossein Azadmaleki

  • Yasaman Haghbin

  • Sina Rashidi

  • Mohammad Javad Momeni Nezhad

  • Ali Zolnour

  • Maryam Zolnoori

  • November 17, 2025

  • 0 min

Share

Clinical Report: SpeechCARE Multimodal Pipeline for Cognitive Impairment Detection

Overview

SpeechCARE is a novel multimodal transformer-based pipeline that detects cognitive impairment from brief speech recordings, classifying Alzheimer’s Disease, Mild Cognitive Impairment, and healthy controls. Trained on a multilingual dataset, it achieved a 72.11% F1-score and demonstrated strong generalizability across languages and speech tasks.

Background

Alzheimer’s disease and related dementias affect a significant portion of adults over 60, with many cases remaining undiagnosed due to subtle symptoms and limited biomarker availability. Speech contains acoustic and linguistic markers indicative of cognitive decline, but prior models have struggled with performance and generalizability. Transformer models, which capture long-range dependencies in speech and language, offer promise but have been underutilized in this domain due to data scarcity and complexity. The National Institute on Aging’s PREPARE challenge provided a diverse multilingual dataset to advance early detection tools.

Data Highlights

MetricValue (Mean ± SD)
Micro AUC86.83% ± 0.46%
Weighted AUC80.67% ± 0.65%
F1-Score72.11% ± 0.44%
Micro Precision (AP)74.73% ± 1.21%
Weighted Precision (AP)73.50% ± 0.66%

Key Findings

  • SpeechCARE integrates acoustic (mHuBERT) and linguistic (mGTE) embeddings with demographic data using an Adaptive Gating Fusion mechanism.
  • Incorporating age as a demographic feature significantly improved predictive accuracy.
  • The model achieved a balanced micro AUC of 86.83% and an F1-score of 72.11% on a held-out multilingual test set (English, Spanish, Mandarin).
  • Threshold optimization enhanced recall for Mild Cognitive Impairment detection.
  • Fairness analysis revealed moderate disparities, particularly among Spanish speakers, indicating areas for further improvement.
  • SpeechCARE complements blood-based biomarkers by capturing functional speech deficits, supporting scalable early detection.

Clinical Implications

SpeechCARE offers a scalable, non-invasive screening tool for early cognitive impairment detection across diverse languages and speech contexts, potentially aiding clinicians in identifying at-risk individuals earlier. Its integration of speech and demographic data enhances accuracy, but attention to language-specific fairness is needed to ensure equitable application. This approach may supplement existing biomarker methods, facilitating broader screening in clinical and community settings.

Conclusion

SpeechCARE demonstrates that a multimodal transformer-based approach can effectively detect cognitive impairment from brief speech samples with strong multilingual generalizability. Its promising performance supports further development and clinical integration for early, accessible cognitive screening.

References

  1. National Institute on Aging PREPARE Challenge 2023-- Multilingual Speech Dataset for Cognitive Impairment Detection
  2. mHuBERT and mGTE Transformer Models for Acoustic and Linguistic Embeddings

Original Source(s)

Related Content