Clinical Scorecard: Integrated Machine Learning Approaches for Video-Based Assessment of Mental Health with a Single Question
At a Glance
Category
Detail
Condition
Depression, Anxiety, and PTSD
Key Mechanisms
Multimodal machine learning model integrating textual (MPNet) and voice prosody (HuBERT) analysis of a single open-ended video response
Target Population
Adults undergoing mental health screening across diverse demographic groups
Care Setting
Primary care and clinical settings with limited mental health provider availability
Key Highlights
A single open-ended question predicts PHQ-9, GAD-7, and PCL-5 scores simultaneously with comparable accuracy to traditional questionnaires.
The approach reduces assessment time by 64.6% (78.4 seconds vs. 221.7 seconds) compared to sequential administration of three separate surveys.
High user acceptance with 90.7% willing to use video-based screening and only 1.4% unwilling to participate.
Guideline-Based Recommendations
Diagnosis
Use a single validated open-ended question: 'In the last 2 weeks have you felt down, nervous, depressed, anxious, hopeless or on edge? If so, please explain in detail how it has bothered you or impacted your life?'
Employ multimodal analysis combining semantic text and voice prosody features to predict depression, anxiety, and PTSD symptom severity.
Management
Integrate efficient multi-condition screening to reduce patient burden and improve engagement in high-volume or longitudinal clinical settings.
Use video-based screening tools to facilitate remote or in-person mental health assessments.
Monitoring & Follow-up
Monitor mental health symptoms longitudinally using the single-question multimodal approach to track changes efficiently over time.
Risks
Be aware that trauma exposure is not directly assessed; the model predicts current PTSD symptom severity but not trauma history.
Ensure demographic consistency and validate model performance across diverse populations to avoid bias.
Patient & Prescribing Data
2420 participants from five independent cohorts, diverse in age, gender, and race/ethnicity
The single-question multimodal model demonstrates strong predictive performance and high acceptability, supporting its use as a scalable screening tool to guide further clinical evaluation and treatment planning.
Clinical Best Practices
Adopt multimodal machine learning tools combining text and voice analysis for efficient mental health screening.
Use a single validated open-ended question to reduce assessment time and patient fatigue.
Ensure screening tools are validated across demographic groups to maintain reliability and equity.
Incorporate video-based assessments to enhance engagement and accessibility in clinical workflows.