A gender-emotion interaction multi-task network for depression recognition via transformer-based multimodal fusion - Report - MDSpire

A gender-emotion interaction multi-task network for depression recognition via transformer-based multimodal fusion

  • By

  • Yujuan Xing

  • Ruifang He

  • Xiaoli Cao

  • Ping Tan

  • Li Chen

  • June 19, 2026

  • 0 min

Share

Clinical Report: A Multi-Task Network Integrating Gender and Emotion for Recognizing Depression

Overview

This study introduces a gender-emotion interaction multi-task network (G-EIMTNet) for recognizing depression through transformer-based multimodal fusion. The proposed method demonstrated improvements in accuracy and F1 score compared to baseline models.

Background

Depression is a prevalent mental disorder that significantly impacts individuals' daily functioning and quality of life. Traditional diagnostic methods are often subjective and inefficient, highlighting the need for automated, non-invasive approaches. Speech-based features present a promising avenue for depression detection due to their ability to convey emotional states.

Data Highlights

MetricBaseline ModelG-EIMTNet
Accuracy+15.88%
F1 Score+14.73%

Key Findings

  • The G-EIMTNet outperformed baseline models in both accuracy and F1 score.
  • Feature extraction utilized Mel-spectrograms and CNNs to capture complex time-frequency features.
  • The MRMR algorithm was effective in selecting relevant acoustic features correlated with emotions and depressive states.
  • Cross-modal attention mechanisms improved the fusion of heterogeneous information from different modalities.
  • Ablation studies confirmed the significance of multi-modal fusion and gender-emotion interaction.

Clinical Implications

Integrating gender and emotional factors into depression recognition models can enhance diagnostic accuracy.

Conclusion

The G-EIMTNet represents an advancement in automated depression recognition, leveraging multimodal data.

Related Resources & Content

  1. BMC Psychiatry, Springer — DNet: a depression recognition network combining residual network and vision transformer
  2. Frontiers in Digital Health — Utilizing Deep Learning and Large Language Models for Multimodal Detection of Depression
  3. Frontiers in Digital Health — Automated emotion recognition via video-based semantic embeddings
  4. Frontiers in Psychiatry — Multimodal behavioral phenotyping for depressive-spectrum classification and severity estimation using eye tracking, facial behavior, and transcript-derived language
  5. Recommendation: Depression and Suicide Risk in Adults: Screening | United States Preventive Services Taskforce
  6. Diagnostic accuracy of traditional and deep learning methods for detecting depression based on speech features: a systematic review and meta-analysis | BMC Psychiatry | Springer Nature Link
  7. Recommendation: Depression and Suicide Risk in Adults: Screening | United States Preventive Services Taskforce
  8. Diagnostic accuracy of traditional and deep learning methods for detecting depression based on speech features: a systematic review and meta-analysis | BMC Psychiatry | Springer Nature Link

Original Source(s)

Related Content