To evaluate the performance of smaller language models (SLMs) enhanced with retrieval-augmented generation (RAG) in clinical decision support for rheumatology compared to larger language models (LLMs).
Key Findings:
Mixtral-8x7b-32768 with RAG achieved the highest diagnostic (72%) and therapeutic (73%) F1 scores.
Nemotron-70b showed strong diagnostic capability without RAG (71%).
Qwen-Turbo excelled in therapeutic suggestions without retrieval (72%).
Mixtral with RAG recorded the highest RAGAS score (81%).
Performance varied significantly across models and configurations.
Interpretation:
SLMs paired with RAG can match or exceed the performance of larger models in clinical decision support while requiring fewer computational resources.
Limitations:
Clinically relevant errors were present across all models.
The need for expert oversight and further real-world validation is essential.
Conclusion:
The findings suggest that SLMs with RAG represent a viable and efficient alternative to larger models for rheumatology applications, though caution is warranted due to potential errors.
A long-term cohort study found that obesity was not associated with worse patient-reported outcomes or higher reoperation rates following total ankle replacement in optimized surgical candidates.