To systematically review and meta-analyze studies applying NLP and ML for the automatic detection of depression from various types of text data, including both spoken and written language.
Key Findings:
Pooled accuracy was 0.80 based on 43 studies with 40,983 text samples.
Pooled precision was 0.78 (28 studies), recall was 0.76 (33 studies), AUC was 0.79 (14 studies), and balanced accuracy was 0.71 (16 studies).
Accuracy was highest in studies using structured clinical interviews, non-English languages, and linguistic or embedding-based features.
Text source was the only significant predictor in meta-regressions, explaining 13.6% of the between-study variance.
Interpretation:
Automated depression detection from text shows promising performance but also substantial heterogeneity, indicating a need for methodological standardization and validation to enhance reliability.
Limitations:
Limited evidence regarding performance across different languages and text sources, which may affect generalizability.
Heterogeneity in study methodologies and sample characteristics may influence the overall findings.
Conclusion:
Findings highlight both the limitations and potential of text-based depression detection, emphasizing the critical need for further research and methodological standardization before clinical application.