Using a fine-tuned large language model for symptom-based depression evaluation - Summary - MDSpire

Using a fine-tuned large language model for symptom-based depression evaluation

  • By

  • Samantha Weber

  • Nicolas Deperrois

  • Robert Heun

  • Laura Frühschütz

  • Anna Monn

  • Stephanie Homan

  • Andrea Häfliger

  • Erich Seifritz

  • Tobias Kowatsch

  • Birgit Kleim

  • Sebastian Olbrich

  • October 7, 2025

  • 0 min

Share

Objective:

To train and evaluate a German BERT-based model specifically for predicting Montgomery-Åsberg Depression Rating Scale (MADRS) scores from structured clinical interviews.

Key Findings:
  • The fine-tuned model achieved a mean absolute error (MAE) of 0.7–1.0 across symptom items, indicating high precision.
  • Accuracies ranged from 79% to 88%, closely matching clinician ratings, demonstrating the model's reliability.
  • Fine-tuning resulted in a 75% reduction in prediction errors compared to the untrained model, showcasing significant improvement.
Interpretation:

The findings indicate that lightweight LLMs can effectively assess depressive symptom severity, providing a scalable tool for clinical decision-making and monitoring treatment progress.

Limitations:
  • The model's performance may vary with different patient populations not represented in the training data, potentially affecting generalizability.
  • The study excluded the 'Apparent Sadness' item, which may limit the comprehensiveness of symptom assessment and affect overall accuracy.
Conclusion:

The study demonstrates the potential of fine-tuned LLMs for automated assessment of depressive symptoms, particularly in low-resource settings.

Original Source(s)

Related Content