Knowledge localization is associated with higher performance of domestic large language models in a Chinese radiation oncology examination - Summary - MDSpire
Advertisement
Knowledge localization is associated with higher performance of domestic large language models in a Chinese radiation oncology examination
To evaluate the performance of both domestic and international large language models (LLMs) in the context of Chinese radiation oncology assessments.
Approach:
Key Findings:
Domestic models, particularly Qwen, achieved an accuracy of 86.30%, surpassing the performance of the single physician reference participant (adjusted P = 0.020).
International models showed a marked decline in performance, particularly in localized knowledge retrieval, emphasizing the need for alignment with regional standards.
Translating the examination into English did not improve performance for international models and revealed a significant language penalty for some domestic architectures (e.g., DeepSeek, P = 0.013).
Error analysis indicated that failures in international models were primarily due to discrepancies between Western and Chinese clinical guidelines.
Interpretation:
The findings suggest that alignment with regional clinical standards is a significant factor influencing model performance in this context, highlighting the need for localized knowledge.
Limitations:
Only one human participant was included for comparison, limiting the generalizability of the results and potentially affecting the conclusions drawn.
Differences in model architecture, training data, and potential test-set contamination may also affect outcomes.
Conclusion:
The study highlights the importance of localized knowledge in enhancing the performance of LLMs in specialized medical assessments.