Integrated ML for Video-Based Multi-Condition Mental Health Screening with One Question
Overview
This study presents a multimodal machine learning model that predicts depression, anxiety, and trauma symptom scores from a single open-ended video question. The approach reduces assessment time by 64.6% compared to traditional questionnaires while maintaining strong predictive performance and high user acceptance across diverse demographics.
Background
Mental health disorders such as depression, anxiety, and PTSD have increased significantly worldwide, exacerbated by the COVID-19 pandemic. Traditional screening relies on separate validated questionnaires like PHQ-9, GAD-7, and PCL-5, which can be time-consuming and burdensome in clinical settings. Remote assessment technologies using video-based semantic and prosodic analysis offer scalable alternatives but often require multiple questions or focus on single conditions. This study aims to streamline multi-condition screening into a single-question video response, improving efficiency and engagement.
Data Highlights
Metric
Single-Question Video Assessment
Traditional Questionnaires
Assessment Time (seconds)
78.4
221.7
Time Reduction
64.6%
Participant Willingness to Use Video Screening
90.7%
Participants Unwilling to Use Video Screening
1.4%
Sample Size
2420 participants
Key Findings
A single open-ended question effectively predicts scores on PHQ-9, GAD-7, and PCL-5 assessments using combined text and voice analysis.
The multimodal model integrates MPNet for semantic text analysis and HuBERT for voice prosody features.
Assessment time is reduced by 64.6% compared to administering three separate questionnaires (78.4 s vs 221.7 s).
High user acceptance with 90.7% willing to use video-based screening and only 1.4% unwilling.
Model performance is consistent across age, gender, and race/ethnicity, supporting broad applicability.
The approach enables simultaneous multi-condition screening, addressing comorbidities efficiently.
Clinical Implications
This single-question video-based screening tool can streamline mental health assessments in clinical settings, reducing patient burden and provider time without sacrificing accuracy. Its high acceptance and demographic consistency support its use as a scalable method for early detection of depression, anxiety, and trauma symptoms. Integrating such tools may improve timely identification and treatment planning, especially in resource-limited environments.
Conclusion
The study demonstrates that a single, multimodal video question can reliably predict multiple mental health conditions, significantly improving screening efficiency and patient engagement. This approach offers a promising solution to current mental health assessment challenges in clinical practice.
References
Integrated Machine Learning Approaches for Video-Based Assessment of Mental Health with a Single Question