Clinical Report: ChatGPT Outperforms Surgeons in Common Thyroid Question Responses
Overview
This study evaluated ChatGPT-4.0's ability to answer 30 common thyroid-related questions compared to junior and senior thyroid specialists. ChatGPT provided faster, longer, and higher-scoring responses in accuracy, comprehensiveness, compassion, and patient and surgeon satisfaction. These findings suggest ChatGPT's potential as a supportive tool for patient education in thyroid disorders.
Background
Thyroid diseases such as hypothyroidism, Hashimoto thyroiditis, and thyroid nodules are prevalent and often require long-term management including surgery. Patients frequently have common questions during diagnosis and follow-up, but surgeon-patient communication is limited by time and resource constraints. Artificial intelligence, particularly large language models like ChatGPT, offers a promising solution to provide timely, accurate, and compassionate responses to patient inquiries. Prior studies have assessed ChatGPT in various medical domains but lacked multi-dimensional evaluation and validation by both patients and physicians in thyroid disease contexts.
Data Highlights
Metric
ChatGPT
Junior Specialist
Senior Specialist
Statistical Significance
Response Speed (median, IQR)
8.69 (7.53-9.48)
4.33 (4.05-4.60)
4.22 (3.36-4.76)
P < .001 vs both specialists
Word Count (median, IQR)
341.50 (301.00-384.25)
74.50 (51.75-84.75)
104.00 (63.75-177.75)
P < .001 vs both specialists
Key Findings
ChatGPT responded significantly faster than both junior and senior thyroid specialists (P < .001).
ChatGPT's responses were substantially longer, with a median word count over three times that of specialists (P < .001).
ChatGPT scored higher than both specialists in accuracy, comprehensiveness, compassion, and overall satisfaction as rated by patients and surgeons.
Despite superior performance on common questions, further research is needed to validate ChatGPT's ability to handle complex thyroid-related clinical queries.
Clinical Implications
ChatGPT-4.0 shows promise as an adjunct tool to enhance patient education and communication in thyroid disease management by providing rapid, accurate, and compassionate answers to common patient questions. Its use could alleviate time constraints faced by clinicians and improve patient understanding and satisfaction. However, clinicians should remain involved for complex or individualized cases until further validation of AI capabilities is available.
Conclusion
ChatGPT-4.0 outperforms junior and senior thyroid specialists in responding to common thyroid-related questions across multiple dimensions, highlighting its potential utility in clinical practice. Continued research is warranted to confirm its role in complex clinical decision-making.
References
Huayitong App Data and Ethics Approval, 2023 -- Source of Thyroid Questions
OpenAI, 2023 -- ChatGPT GPT-4.0 Launch and Capabilities
West China Hospital of Sichuan University, 2023 -- Study Ethics and Data Source
The nurse practitioner profession claims the No. 1 spot across three categories in the U.S. News & World Report 2026 Best Jobs rankings for the third consecutive year.
A VHA study across 11 vendors finds AI-generated primary care notes score lower than clinician-written notes, with the largest deficits in thoroughness, organization, and usefulness