Multimodal machine learning for video based single question mental health assessment - Scorecard - MDSpire

Multimodal machine learning for video based single question mental health assessment

  • By

  • Bradley Grimm

  • Pernille Yilmam

  • Brett Talbot

  • Loren Larsen

  • December 16, 2025

  • 0 min

Share

Clinical Scorecard: Integrated Machine Learning Approaches for Video-Based Assessment of Mental Health with a Single Question

At a Glance

CategoryDetail
ConditionDepression, Anxiety, and PTSD
Key MechanismsMultimodal machine learning model integrating textual (MPNet) and voice prosody (HuBERT) analysis of a single open-ended video response
Target PopulationAdults undergoing mental health screening across diverse demographic groups
Care SettingPrimary care and clinical settings with limited mental health provider availability

Key Highlights

  • A single open-ended question predicts PHQ-9, GAD-7, and PCL-5 scores simultaneously with comparable accuracy to traditional questionnaires.
  • The approach reduces assessment time by 64.6% (78.4 seconds vs. 221.7 seconds) compared to sequential administration of three separate surveys.
  • High user acceptance with 90.7% willing to use video-based screening and only 1.4% unwilling to participate.

Guideline-Based Recommendations

Diagnosis

  • Use a single validated open-ended question: 'In the last 2 weeks have you felt down, nervous, depressed, anxious, hopeless or on edge? If so, please explain in detail how it has bothered you or impacted your life?'
  • Employ multimodal analysis combining semantic text and voice prosody features to predict depression, anxiety, and PTSD symptom severity.

Management

  • Integrate efficient multi-condition screening to reduce patient burden and improve engagement in high-volume or longitudinal clinical settings.
  • Use video-based screening tools to facilitate remote or in-person mental health assessments.

Monitoring & Follow-up

  • Monitor mental health symptoms longitudinally using the single-question multimodal approach to track changes efficiently over time.

Risks

  • Be aware that trauma exposure is not directly assessed; the model predicts current PTSD symptom severity but not trauma history.
  • Ensure demographic consistency and validate model performance across diverse populations to avoid bias.

Patient & Prescribing Data

2420 participants from five independent cohorts, diverse in age, gender, and race/ethnicity

The single-question multimodal model demonstrates strong predictive performance and high acceptability, supporting its use as a scalable screening tool to guide further clinical evaluation and treatment planning.

Clinical Best Practices

  • Adopt multimodal machine learning tools combining text and voice analysis for efficient mental health screening.
  • Use a single validated open-ended question to reduce assessment time and patient fatigue.
  • Ensure screening tools are validated across demographic groups to maintain reliability and equity.
  • Incorporate video-based assessments to enhance engagement and accessibility in clinical workflows.

References

Original Source(s)

Related Content