Comparing ChatGPT's and Surgeon's Responses to Thyroid-related Questions From Patients

By
Siyin Guo
Ruicen Li
Genpeng Li
Wenjie Chen
Jing Huang
Linye He
Yu Ma
Liying Wang
Hongping Zheng
Chunxiang Tian
Yatong Zhao
Xinmin Pan
Hongxing Wan
Dasheng Liu
Zhihui Li
Jianyong Lei
April 10, 2024
0 min

The Journal Of Clinical Endocrinology & Metabolism

Overview

This study evaluated ChatGPT-4.0's ability to answer 30 common thyroid-related questions compared to junior and senior thyroid specialists. ChatGPT provided faster, longer, and higher-scoring responses in accuracy, comprehensiveness, compassion, and patient and surgeon satisfaction. These findings suggest ChatGPT's potential as a supportive tool for patient education in thyroid disorders.

Background

Thyroid diseases such as hypothyroidism, Hashimoto thyroiditis, and thyroid nodules are prevalent and often require long-term management including surgery. Patients frequently have common questions during diagnosis and follow-up, but surgeon-patient communication is limited by time and resource constraints. Artificial intelligence, particularly large language models like ChatGPT, offers a promising solution to provide timely, accurate, and compassionate responses to patient inquiries. Prior studies have assessed ChatGPT in various medical domains but lacked multi-dimensional evaluation and validation by both patients and physicians in thyroid disease contexts.

Data Highlights

Metric	ChatGPT	Junior Specialist	Senior Specialist	Statistical Significance
Response Speed (median, IQR)	8.69 (7.53-9.48)	4.33 (4.05-4.60)	4.22 (3.36-4.76)	P < .001 vs both specialists
Word Count (median, IQR)	341.50 (301.00-384.25)	74.50 (51.75-84.75)	104.00 (63.75-177.75)	P < .001 vs both specialists

Key Findings

ChatGPT responded significantly faster than both junior and senior thyroid specialists (P < .001).
ChatGPT's responses were substantially longer, with a median word count over three times that of specialists (P < .001).
ChatGPT scored higher than both specialists in accuracy, comprehensiveness, compassion, and overall satisfaction as rated by patients and surgeons.
ChatGPT correctly identified and addressed intentionally misleading questions, demonstrating logical reasoning capabilities.
Despite superior performance on common questions, further research is needed to validate ChatGPT's ability to handle complex thyroid-related clinical queries.

Clinical Implications

ChatGPT-4.0 shows promise as an adjunct tool to enhance patient education and communication in thyroid disease management by providing rapid, accurate, and compassionate answers to common patient questions. Its use could alleviate time constraints faced by clinicians and improve patient understanding and satisfaction. However, clinicians should remain involved for complex or individualized cases until further validation of AI capabilities is available.

Conclusion

ChatGPT-4.0 outperforms junior and senior thyroid specialists in responding to common thyroid-related questions across multiple dimensions, highlighting its potential utility in clinical practice. Continued research is warranted to confirm its role in complex clinical decision-making.

References

Huayitong App Data and Ethics Approval, 2023 -- Source of Thyroid Questions
OpenAI, 2023 -- ChatGPT GPT-4.0 Launch and Capabilities
West China Hospital of Sichuan University, 2023 -- Study Ethics and Data Source

Comparing ChatGPT's and Surgeon's Responses to Thyroid-related Questions From Patients

Clinical Report: ChatGPT Outperforms Surgeons in Common Thyroid Question Responses

Overview

Background

Data Highlights

Key Findings

Clinical Implications

Conclusion

References

Original Source(s)

Comparing ChatGPT's and Surgeon's Responses to Thyroid-related Questions From Patients

Related Content

NPs Top US Jobs List Again

A comprehensive review and meta-analysis of the association between thyroid disorders and the risk of liver cancer development

AI Scribes Lag Clinicians on Note Quality