AI Scribes Lag Clinicians on Note Quality - Summary - MDSpire

AI Scribes Lag Clinicians on Note Quality

  • By

  • Kerri Miller

  • April 17, 2026

  • 6 min

Share

Objective:

To evaluate the quality of notes generated by AI scribe tools compared to those written by clinicians specifically in primary care scenarios.

Key Findings:
  • Human-generated notes scored higher than AI-generated notes across all five cases, with statistically significant differences noted in three scenarios.
  • The largest gap emerged in the acute low back pain scenario, where human notes averaged 43.8 points compared with 20.3 points for AI-generated notes.
  • AI-generated notes scored lower in all 10 quality domains, with the largest deficits in thoroughness, organization, and usefulness.
Interpretation:

The study indicates that while AI scribes may improve efficiency, they currently produce documentation of lower quality compared to human clinicians, which has significant implications for patient care.

Limitations:
  • Simulated cases may not reflect real-world clinical complexity.
  • Human notes were not produced in typical clinical workflows.
  • Rater blinding may have been imperfect, and the PDQI-9 may not fully capture AI-specific errors.
  • Vendors were not permitted to generate multiple iterations of notes, which could influence AI performance.
Conclusion:

AI scribes should be used to generate draft documentation that requires thorough clinician review and editing, rather than replacing clinician-authored notes.

Original Source(s)

Related Content