Low-energy Small Language Models with Retrieval-Augmented Generation can Surpass Large-Model Performance in Rheumatology - Report - MDSpire

Low-energy Small Language Models with Retrieval-Augmented Generation can Surpass Large-Model Performance in Rheumatology

  • By

  • Felde, Sabine

  • Buchkremer, Rüdiger

  • Chehab, Gamal

  • Thielscher, Christian

  • Distler, Jörg HW

  • Schneider, Matthias

  • Richter, Jutta G

  • April 23, 2026

  • 0 min

Share

Clinical Report: Smaller Language Models Enhanced with Retrieval-Augmented Generation

Overview

This study evaluates the performance of smaller language models (SLMs) enhanced with retrieval-augmented generation (RAG) in rheumatology applications. Findings indicate that SLMs can achieve diagnostic and therapeutic precision comparable to larger models while requiring fewer computational resources.

Background

The integration of artificial intelligence in clinical decision support is gaining traction, particularly in complex fields like rheumatology. Large language models (LLMs) face challenges related to computational demands and potential inaccuracies, making smaller models with RAG a promising alternative. Understanding their efficacy is crucial for improving clinical outcomes and resource efficiency.

Data Highlights

ModelDiagnostic F1 ScoreTherapeutic F1 ScoreRAGAS Score
Mixtral-8x7b-32768 with RAG72%73%81%
Nemotron-70b without RAG71%N/AN/A
Qwen-Turbo without RAGN/A72%N/A

Key Findings

  • Mixtral-8x7b-32768 with RAG achieved the highest diagnostic (72%) and therapeutic (73%) F1 scores.
  • Nemotron-70b demonstrated strong diagnostic capability without RAG (71%).
  • Qwen-Turbo excelled in therapeutic suggestions without retrieval (72%).
  • The highest RAGAS score was recorded for Mixtral with RAG (81%).
  • Performance varied significantly across models and configurations.
  • Clinically relevant errors were noted across all models, necessitating expert oversight.

Clinical Implications

The findings suggest that smaller language models with RAG can serve as effective tools for clinical decision support in rheumatology, potentially reducing computational costs. However, the presence of clinically relevant errors underscores the importance of expert validation in their application.

Conclusion

SLMs enhanced with RAG represent a viable alternative to larger models in clinical settings, offering comparable performance with reduced resource demands. Continued evaluation and oversight are essential for safe implementation.

Related Resources & Content

  1. JMIR Medical Informatics, 2026 -- Clinical Context Variables Collectively Rival Model Choice in Embedding-Based Retrieval: Multi-Corpus Benchmark Study
  2. npj Digital Medicine, 2025 -- The evaluation illusion of large language models in medicine
  3. Frontiers in Medicine, 2026 -- Utility of large language models as information tools for nursing care in gout: a comparative study of DeepSeek and ChatGPT
  4. npj Digital Medicine, 2026 -- Collaboration Between Humans and Large Language Models in Clinical Practice: A Systematic Review and Meta-Analysis
  5. 2025 UPDATE FOR RHEUMATOID ARTHRITIS
  6. 2025 American College of Rheumatology (ACR) Guideline for the Treatment of Systemic Lupus Erythematosus - PubMed
  7. 2025 British Society for Rheumatology guideline for the treatment of axial spondyloarthritis with biologic and targeted synthetic DMARDs | Rheumatology | Oxford Academic
  8. EULAR points to consider and consensus definitions for difficult-to-manage and treatment-refractory psoriatic arthritis - ScienceDirect
  9. Long-term effect of anifrolumab on patient-reported outcomes in systemic lupus erythematosus (TULIP-LTE): a randomised, placebo-controlled, phase 3 long-term extension trial - ScienceDirect
  10. EULAR Updates Recommendations for Managing RA
  11. 2025 UPDATE FOR RHEUMATOID ARTHRITIS
  12. 2025 American College of Rheumatology (ACR) Guideline for the Treatment of Systemic Lupus Erythematosus - PubMed
  13. 2025 British Society for Rheumatology guideline for the treatment of axial spondyloarthritis with biologic and targeted synthetic DMARDs | Rheumatology | Oxford Academic
  14. EULAR points to consider and consensus definitions for difficult-to-manage and treatment-refractory psoriatic arthritis - ScienceDirect
  15. Long-term effect of anifrolumab on patient-reported outcomes in systemic lupus erythematosus (TULIP-LTE): a randomised, placebo-controlled, phase 3 long-term extension trial - ScienceDirect
  16. Deucravacitinib Meets Endpoints in Phase 3 PsA Trial
  17. Ivarmacitinib, a selective Janus kinase 1 inhibitor, in patients with moderate-to-severe active rheumatoid arthritis and inadequate response to conventional synthetic DMARDs: results from a phase III randomised clinical trial - ScienceDirect
  18. Evaluation of the placebo and treatment effect overtime in randomised clinical trials evaluating the efficacy of biologics in axial spondyloarthritis: systematic review and meta-analysis - PubMed

Original Source(s)

Related Content