Low-energy Small Language Models with Retrieval-Augmented Generation can Surpass Large-Model Performance in Rheumatology - Summary - MDSpire

Low-energy Small Language Models with Retrieval-Augmented Generation can Surpass Large-Model Performance in Rheumatology

  • By

  • Felde, Sabine

  • Buchkremer, Rüdiger

  • Chehab, Gamal

  • Thielscher, Christian

  • Distler, Jörg HW

  • Schneider, Matthias

  • Richter, Jutta G

  • April 23, 2026

  • 0 min

Share

Objective:

To evaluate the performance of smaller language models (SLMs) enhanced with retrieval-augmented generation (RAG) in clinical decision support for rheumatology compared to larger language models (LLMs).

Key Findings:
  • Mixtral-8x7b-32768 with RAG achieved the highest diagnostic (72%) and therapeutic (73%) F1 scores.
  • Nemotron-70b showed strong diagnostic capability without RAG (71%).
  • Qwen-Turbo excelled in therapeutic suggestions without retrieval (72%).
  • Mixtral with RAG recorded the highest RAGAS score (81%).
  • Performance varied significantly across models and configurations.
Interpretation:

SLMs paired with RAG can match or exceed the performance of larger models in clinical decision support while requiring fewer computational resources.

Limitations:
  • Clinically relevant errors were present across all models.
  • The need for expert oversight and further real-world validation is essential.
Conclusion:

The findings suggest that SLMs with RAG represent a viable and efficient alternative to larger models for rheumatology applications, though caution is warranted due to potential errors.

Original Source(s)

Related Content