Benchmark Integrity and Reasoning-Trace Errors in Medical Question Answering With Large Language Models: Mixed Methods Study With Sparse Autoencoders
-
By
-
Jialin Liu
-
Siru Liu
-
Adam Wright
-
June 12, 2026
-
0 min