Low-energy Small Language Models with Retrieval-Augmented Generation can Surpass Large-Model Performance in Rheumatology

By
Felde, Sabine
Buchkremer, Rüdiger
Chehab, Gamal
Thielscher, Christian
Distler, Jörg HW
Schneider, Matthias
Richter, Jutta G
April 23, 2026
0 min

Frontiers In Medicine

Objective:

To evaluate the performance of smaller language models (SLMs) enhanced with retrieval-augmented generation (RAG) in clinical decision support for rheumatology compared to larger language models (LLMs).

Key Findings:

Mixtral-8x7b-32768 with RAG achieved the highest diagnostic (72%) and therapeutic (73%) F1 scores.
Nemotron-70b showed strong diagnostic capability without RAG (71%).
Qwen-Turbo excelled in therapeutic suggestions without retrieval (72%).
Mixtral with RAG recorded the highest RAGAS score (81%).
Performance varied significantly across models and configurations.

Interpretation:

SLMs paired with RAG can match or exceed the performance of larger models in clinical decision support while requiring fewer computational resources.

Limitations:

Clinically relevant errors were present across all models.
The need for expert oversight and further real-world validation is essential.

Conclusion:

The findings suggest that SLMs with RAG represent a viable and efficient alternative to larger models for rheumatology applications, though caution is warranted due to potential errors.

Low-energy Small Language Models with Retrieval-Augmented Generation can Surpass Large-Model Performance in Rheumatology

Objective:

Key Findings:

Interpretation:

Limitations:

Conclusion:

Original Source(s)

Low-energy Small Language Models with Retrieval-Augmented Generation can Surpass Large-Model Performance in Rheumatology

Related Content

TENS Added to Physical Therapy Lowers Fibromyalgia Pain

Combination Therapy Shows Potential in Psoriasis, Obesity

Obesity Not Associated With Worse Long-Term TAR Outcomes