To train and evaluate a German BERT-based model specifically for predicting Montgomery-Åsberg Depression Rating Scale (MADRS) scores from structured clinical interviews.
Key Findings:
The fine-tuned model achieved a mean absolute error (MAE) of 0.7–1.0 across symptom items, indicating high precision.
Accuracies ranged from 79% to 88%, closely matching clinician ratings, demonstrating the model's reliability.
Fine-tuning resulted in a 75% reduction in prediction errors compared to the untrained model, showcasing significant improvement.
Interpretation:
The findings indicate that lightweight LLMs can effectively assess depressive symptom severity, providing a scalable tool for clinical decision-making and monitoring treatment progress.
Limitations:
The model's performance may vary with different patient populations not represented in the training data, potentially affecting generalizability.
The study excluded the 'Apparent Sadness' item, which may limit the comprehensiveness of symptom assessment and affect overall accuracy.
Conclusion:
The study demonstrates the potential of fine-tuned LLMs for automated assessment of depressive symptoms, particularly in low-resource settings.
by Samantha Weber, Nicolas Deperrois, Robert Heun, Laura Frühschütz, Anna Monn, Stephanie Homan, Andrea Häfliger, Erich Seifritz, Tobias Kowatsch, Birgit Kleim, Sebastian Olbrich