To systematically evaluate the quality of diabetes education texts generated by various generative AI models.
Approach:
Selection of AI Models: Seven generative AI models were selected for evaluation, including ERNIE Bot-3.5, iFlytek Spark-V3.5, Kimi-K1.5, ChatGPT-4o, Tiangong-AI2.2.0, Doubao Large Model, and Deepseek-R1.
Text Generation: Ten prevalent questions related to diabetes health education were presented to each AI model to generate relevant texts.
Quality Evaluation: Five experts evaluated the quality of the generated texts based on their clinical experience and background in diabetes health education.
Key Findings:
Existing research lacks a comprehensive assessment framework for AI-generated health education texts.
Generative AI models vary significantly in the accuracy and quality of the information they produce.
A systematic evaluation can help users select appropriate AI models based on their needs.
Interpretation:
The study highlights the necessity for a multi-dimensional evaluation framework to assess the quality of health education texts generated by AI.
Limitations:
The study focused solely on diabetes education texts and may not be generalizable to other health topics.
The evaluation was limited to seven AI models, which may not represent the full spectrum of available generative AI tools.
Conclusion:
A comprehensive evaluation of AI-generated health education texts is essential for improving public health literacy and guiding the selection of appropriate AI tools.