Language-based detection of depression with machine learning: systematic review and meta-analysis - Report - MDSpire

Language-based detection of depression with machine learning: systematic review and meta-analysis

  • By

  • Hadar Fisher

  • Nigel M. Jaffe

  • Kristina Pidvirny

  • Anna O. Tierney

  • Mia S. Vaidean

  • Poorvesh Dongre

  • Christian A. Webb

  • February 24, 2026

  • 0 min

Share

Automated Depression Detection via Language Analysis: Systematic Review & Meta-Analysis

Overview

This systematic review and meta-analysis evaluated 123 studies using natural language processing and machine learning to detect depression from text. Pooled accuracy across 43 studies was 0.80, with precision 0.78, recall 0.76, and AUC 0.79, indicating promising but heterogeneous performance.

Background

Early identification of depression is crucial for timely intervention and improved outcomes. Advances in natural language processing (NLP) and machine learning (ML) have enabled automated detection of depression from spoken or written language. Despite growing research, the overall diagnostic performance and factors influencing accuracy remain unclear. This review synthesizes existing evidence to assess the effectiveness and limitations of these automated approaches.

Data Highlights

MetricNumber of StudiesPooled Estimate
Accuracy430.80
Precision280.78
Recall330.76
AUC140.79
Balanced Accuracy160.71

Key Findings

  • Pooled accuracy of automated depression detection from language was 0.80 across 40,983 text samples.
  • Precision and recall were 0.78 and 0.76 respectively, indicating balanced performance in identifying true positives.
  • Area under the curve (AUC) was 0.79, supporting good discriminative ability.
  • Significant heterogeneity existed, influenced by language, text source, feature type, and classifier used.
  • Accuracy was highest in studies using structured clinical interviews, non-English languages, and linguistic or embedding-based features.
  • Text source was the only significant predictor explaining 13.6% of between-study variance in meta-regression.

Clinical Implications

Automated depression detection using NLP and ML shows potential as a supplementary screening tool, especially when applied to structured clinical interviews and diverse languages. However, substantial variability in methods and performance underscores the need for standardized protocols and rigorous validation before clinical implementation. Clinicians should interpret automated results cautiously and in conjunction with comprehensive clinical assessment.

Conclusion

Automated language-based depression detection demonstrates promising accuracy but is limited by heterogeneity and methodological inconsistencies. Future research should focus on standardization and external validation to enable reliable clinical application.

References

  1. Teferra BG et al. 2024 -- Screening for depression using natural language processing: literature review
  2. Mao K, Wu Y, Chen J. 2023 -- A systematic review on automated clinical depression diagnosis
  3. Le Glaz A et al. 2021 -- Machine learning and natural language processing in mental health: systematic review

Original Source(s)

Related Content