Language-based detection of depression with machine learning: systematic review and meta-analysis

By
Hadar Fisher
Nigel M. Jaffe
Kristina Pidvirny
Anna O. Tierney
Mia S. Vaidean
Poorvesh Dongre
Christian A. Webb
February 24, 2026
0 min

Npj Digital Medicine

Overview

This systematic review and meta-analysis evaluated 123 studies using natural language processing and machine learning to detect depression from text. Pooled accuracy across 43 studies was 0.80, with precision 0.78, recall 0.76, and AUC 0.79, indicating promising but heterogeneous performance.

Background

Early identification of depression is crucial for timely intervention and improved outcomes. Advances in natural language processing (NLP) and machine learning (ML) have enabled automated detection of depression from spoken or written language. Despite growing research, the overall diagnostic performance and factors influencing accuracy remain unclear. This review synthesizes existing evidence to assess the effectiveness and limitations of these automated approaches.

Data Highlights

Metric	Number of Studies	Pooled Estimate
Accuracy	43	0.80
Precision	28	0.78
Recall	33	0.76
AUC	14	0.79
Balanced Accuracy	16	0.71

Key Findings

Pooled accuracy of automated depression detection from language was 0.80 across 40,983 text samples.
Precision and recall were 0.78 and 0.76 respectively, indicating balanced performance in identifying true positives.
Area under the curve (AUC) was 0.79, supporting good discriminative ability.
Significant heterogeneity existed, influenced by language, text source, feature type, and classifier used.
Accuracy was highest in studies using structured clinical interviews, non-English languages, and linguistic or embedding-based features.
Text source was the only significant predictor explaining 13.6% of between-study variance in meta-regression.

Clinical Implications

Automated depression detection using NLP and ML shows potential as a supplementary screening tool, especially when applied to structured clinical interviews and diverse languages. However, substantial variability in methods and performance underscores the need for standardized protocols and rigorous validation before clinical implementation. Clinicians should interpret automated results cautiously and in conjunction with comprehensive clinical assessment.

Conclusion

Automated language-based depression detection demonstrates promising accuracy but is limited by heterogeneity and methodological inconsistencies. Future research should focus on standardization and external validation to enable reliable clinical application.

References

Teferra BG et al. 2024 -- Screening for depression using natural language processing: literature review
Mao K, Wu Y, Chen J. 2023 -- A systematic review on automated clinical depression diagnosis
Le Glaz A et al. 2021 -- Machine learning and natural language processing in mental health: systematic review

Language-based detection of depression with machine learning: systematic review and meta-analysis

Automated Depression Detection via Language Analysis: Systematic Review & Meta-Analysis

Overview

Background

Data Highlights

Key Findings

Clinical Implications

Conclusion

References

Original Source(s)

Language-based detection of depression with machine learning: systematic review and meta-analysis

Related Content

When minds and networks matter: how mental health and social capital shape social frailty in older adults

The dual-sensitive period gut-brain crosstalk, neuroinflammation, and the biological roots of adolescent depression

Top 10 States With Rising Demand for Physicians