Clinical Report: Patient-Centric Evaluation of CNN and Transformer Models
Overview
This study rigorously evaluates nine deep learning models for breast cancer histopathology classification, addressing data leakage issues in prior research. The findings indicate that architectural differences among models do not yield statistically significant performance variations under a controlled evaluation framework.
Background
Breast cancer is a leading cause of cancer-related mortality, making accurate diagnostic methods essential for effective treatment. Histopathological imaging is the gold standard for diagnosis but is often subjective and time-consuming. Automated systems using deep learning can enhance diagnostic efficiency and consistency, yet prior studies have been limited by data leakage and inconsistent evaluation methods.
ResNet50 achieved the highest mean accuracy of 0.9267 ± 0.0435 and F1-score of 0.9472.
All models demonstrated comparable performance with mean accuracies ranging from 0.91 to 0.93.
No statistically significant differences were found among the models (p > 0.05 after correction).
Intermediate magnification levels (40× and 200×) provided more discriminative features compared to higher magnification (400×).
A patient-aware cross-validation protocol was implemented to prevent data leakage.
Clinical Implications
The study underscores the importance of rigorous evaluation protocols in developing AI systems for breast cancer diagnosis. Clinicians should consider the implications of model performance under controlled conditions when integrating AI tools into practice.
Conclusion
The findings suggest that while deep learning models perform similarly in breast cancer histopathology classification, the evaluation design is crucial for ensuring reliable outcomes. This study provides a foundation for future research in clinically applicable AI systems.