Multimodal machine learning for video based single question mental health assessment

By
Bradley Grimm
Pernille Yilmam
Brett Talbot
Loren Larsen
December 16, 2025
0 min

Npj Digital Medicine

Overview

This study presents a multimodal machine learning model that predicts depression, anxiety, and trauma symptom scores from a single open-ended video question. The approach reduces assessment time by 64.6% compared to traditional questionnaires while maintaining strong predictive performance and high user acceptance across diverse demographics.

Background

Mental health disorders such as depression, anxiety, and PTSD have increased significantly worldwide, exacerbated by the COVID-19 pandemic. Traditional screening relies on separate validated questionnaires like PHQ-9, GAD-7, and PCL-5, which can be time-consuming and burdensome in clinical settings. Remote assessment technologies using video-based semantic and prosodic analysis offer scalable alternatives but often require multiple questions or focus on single conditions. This study aims to streamline multi-condition screening into a single-question video response, improving efficiency and engagement.

Data Highlights

Metric	Single-Question Video Assessment	Traditional Questionnaires
Assessment Time (seconds)	78.4	221.7
Time Reduction	64.6%
Participant Willingness to Use Video Screening	90.7%
Participants Unwilling to Use Video Screening	1.4%
Sample Size	2420 participants

Key Findings

A single open-ended question effectively predicts scores on PHQ-9, GAD-7, and PCL-5 assessments using combined text and voice analysis.
The multimodal model integrates MPNet for semantic text analysis and HuBERT for voice prosody features.
Assessment time is reduced by 64.6% compared to administering three separate questionnaires (78.4 s vs 221.7 s).
High user acceptance with 90.7% willing to use video-based screening and only 1.4% unwilling.
Model performance is consistent across age, gender, and race/ethnicity, supporting broad applicability.
The approach enables simultaneous multi-condition screening, addressing comorbidities efficiently.

Clinical Implications

This single-question video-based screening tool can streamline mental health assessments in clinical settings, reducing patient burden and provider time without sacrificing accuracy. Its high acceptance and demographic consistency support its use as a scalable method for early detection of depression, anxiety, and trauma symptoms. Integrating such tools may improve timely identification and treatment planning, especially in resource-limited environments.

Conclusion

The study demonstrates that a single, multimodal video question can reliably predict multiple mental health conditions, significantly improving screening efficiency and patient engagement. This approach offers a promising solution to current mental health assessment challenges in clinical practice.

References

Integrated Machine Learning Approaches for Video-Based Assessment of Mental Health with a Single Question

Multimodal machine learning for video based single question mental health assessment

Integrated ML for Video-Based Multi-Condition Mental Health Screening with One Question

Overview

Background

Data Highlights

Key Findings

Clinical Implications

Conclusion

References

Original Source(s)

Multimodal machine learning for video based single question mental health assessment

Related Content

Strategies for Safeguarding Refugee Children Against Mental Health Issues: A Scoping Review of Alterable Factors for Preventive Measures

Park Features Linked to Depression in Women: A Cross-Sectional Analysis of 329,363 Adults

Tandospirone augments cisplatin treatment by lowering cholesterol and managing distress in NSCLC patients