Multimodal machine learning for video based single question mental health assessment - Report - MDSpire

Multimodal machine learning for video based single question mental health assessment

  • By

  • Bradley Grimm

  • Pernille Yilmam

  • Brett Talbot

  • Loren Larsen

  • December 16, 2025

  • 0 min

Share

Integrated ML for Video-Based Multi-Condition Mental Health Screening with One Question

Overview

This study presents a multimodal machine learning model that predicts depression, anxiety, and trauma symptom scores from a single open-ended video question. The approach reduces assessment time by 64.6% compared to traditional questionnaires while maintaining strong predictive performance and high user acceptance across diverse demographics.

Background

Mental health disorders such as depression, anxiety, and PTSD have increased significantly worldwide, exacerbated by the COVID-19 pandemic. Traditional screening relies on separate validated questionnaires like PHQ-9, GAD-7, and PCL-5, which can be time-consuming and burdensome in clinical settings. Remote assessment technologies using video-based semantic and prosodic analysis offer scalable alternatives but often require multiple questions or focus on single conditions. This study aims to streamline multi-condition screening into a single-question video response, improving efficiency and engagement.

Data Highlights

MetricSingle-Question Video AssessmentTraditional Questionnaires
Assessment Time (seconds)78.4221.7
Time Reduction64.6%
Participant Willingness to Use Video Screening90.7%
Participants Unwilling to Use Video Screening1.4%
Sample Size2420 participants

Key Findings

  • A single open-ended question effectively predicts scores on PHQ-9, GAD-7, and PCL-5 assessments using combined text and voice analysis.
  • The multimodal model integrates MPNet for semantic text analysis and HuBERT for voice prosody features.
  • Assessment time is reduced by 64.6% compared to administering three separate questionnaires (78.4 s vs 221.7 s).
  • High user acceptance with 90.7% willing to use video-based screening and only 1.4% unwilling.
  • Model performance is consistent across age, gender, and race/ethnicity, supporting broad applicability.
  • The approach enables simultaneous multi-condition screening, addressing comorbidities efficiently.

Clinical Implications

This single-question video-based screening tool can streamline mental health assessments in clinical settings, reducing patient burden and provider time without sacrificing accuracy. Its high acceptance and demographic consistency support its use as a scalable method for early detection of depression, anxiety, and trauma symptoms. Integrating such tools may improve timely identification and treatment planning, especially in resource-limited environments.

Conclusion

The study demonstrates that a single, multimodal video question can reliably predict multiple mental health conditions, significantly improving screening efficiency and patient engagement. This approach offers a promising solution to current mental health assessment challenges in clinical practice.

References

  1. Integrated Machine Learning Approaches for Video-Based Assessment of Mental Health with a Single Question

Original Source(s)

Related Content