Comparing ChatGPT's and Surgeon's Responses to Thyroid-related Questions From Patients - Report - MDSpire

Comparing ChatGPT's and Surgeon's Responses to Thyroid-related Questions From Patients

  • By

  • Siyin Guo

  • Ruicen Li

  • Genpeng Li

  • Wenjie Chen

  • Jing Huang

  • Linye He

  • Yu Ma

  • Liying Wang

  • Hongping Zheng

  • Chunxiang Tian

  • Yatong Zhao

  • Xinmin Pan

  • Hongxing Wan

  • Dasheng Liu

  • Zhihui Li

  • Jianyong Lei

  • April 10, 2024

  • 0 min

Share

Clinical Report: ChatGPT Outperforms Surgeons in Common Thyroid Question Responses

Overview

This study evaluated ChatGPT-4.0's ability to answer 30 common thyroid-related questions compared to junior and senior thyroid specialists. ChatGPT provided faster, longer, and higher-scoring responses in accuracy, comprehensiveness, compassion, and patient and surgeon satisfaction. These findings suggest ChatGPT's potential as a supportive tool for patient education in thyroid disorders.

Background

Thyroid diseases such as hypothyroidism, Hashimoto thyroiditis, and thyroid nodules are prevalent and often require long-term management including surgery. Patients frequently have common questions during diagnosis and follow-up, but surgeon-patient communication is limited by time and resource constraints. Artificial intelligence, particularly large language models like ChatGPT, offers a promising solution to provide timely, accurate, and compassionate responses to patient inquiries. Prior studies have assessed ChatGPT in various medical domains but lacked multi-dimensional evaluation and validation by both patients and physicians in thyroid disease contexts.

Data Highlights

MetricChatGPTJunior SpecialistSenior SpecialistStatistical Significance
Response Speed (median, IQR)8.69 (7.53-9.48)4.33 (4.05-4.60)4.22 (3.36-4.76)P < .001 vs both specialists
Word Count (median, IQR)341.50 (301.00-384.25)74.50 (51.75-84.75)104.00 (63.75-177.75)P < .001 vs both specialists

Key Findings

  • ChatGPT responded significantly faster than both junior and senior thyroid specialists (P < .001).
  • ChatGPT's responses were substantially longer, with a median word count over three times that of specialists (P < .001).
  • ChatGPT scored higher than both specialists in accuracy, comprehensiveness, compassion, and overall satisfaction as rated by patients and surgeons.
  • ChatGPT correctly identified and addressed intentionally misleading questions, demonstrating logical reasoning capabilities.
  • Despite superior performance on common questions, further research is needed to validate ChatGPT's ability to handle complex thyroid-related clinical queries.

Clinical Implications

ChatGPT-4.0 shows promise as an adjunct tool to enhance patient education and communication in thyroid disease management by providing rapid, accurate, and compassionate answers to common patient questions. Its use could alleviate time constraints faced by clinicians and improve patient understanding and satisfaction. However, clinicians should remain involved for complex or individualized cases until further validation of AI capabilities is available.

Conclusion

ChatGPT-4.0 outperforms junior and senior thyroid specialists in responding to common thyroid-related questions across multiple dimensions, highlighting its potential utility in clinical practice. Continued research is warranted to confirm its role in complex clinical decision-making.

References

  1. Huayitong App Data and Ethics Approval, 2023 -- Source of Thyroid Questions
  2. OpenAI, 2023 -- ChatGPT GPT-4.0 Launch and Capabilities
  3. West China Hospital of Sichuan University, 2023 -- Study Ethics and Data Source

Original Source(s)

Related Content