Benchmark Integrity and Reasoning-Trace Errors in Medical Question Answering With Large Language Models: Mixed Methods Study With Sparse Autoencoders - Poll - MDSpire

Benchmark Integrity and Reasoning-Trace Errors in Medical Question Answering With Large Language Models: Mixed Methods Study With Sparse Autoencoders

  • By

  • Jialin Liu

  • Siru Liu

  • Adam Wright

  • June 12, 2026

  • 0 min

Share

Original Source(s)

Related Content