AI Model Trails Expert Skin Lesion Readers

A foundation artificial intelligence model surpassed less experienced physicians but did not outperform expert dermatologists in multiclass skin lesion diagnosis.

By
Andrea Surnit
June 27, 2026
6 min

Conexiant

Objective:

To compare the diagnostic accuracy of artificial intelligence (AI) systems with that of physicians in diagnosing skin lesions.

Approach:

Study Design: A prospective diagnostic study using retrospectively collected images from 1,117 skin lesion cases, comparing 3 AI systems with physician readers.
AI Systems: Included a first-generation convolutional neural network and 2 configurations of the PanDerm foundation model (unimodal and multimodal).
Participants: 652 physicians contributed 1,092 completed test iterations, with varying levels of dermoscopy experience.
Outcomes: Primary outcome was multiclass diagnostic accuracy; secondary outcomes included sensitivity, specificity, and area under the receiver operating characteristic curve.

Key Findings:

Physicians with more than 10 years of experience had the highest diagnostic accuracy at 74%.
The unimodal PanDerm model achieved 72% accuracy, outperforming less experienced physicians.
In binary discrimination, the unimodal model had the highest balanced accuracy at 0.82.
The multimodal model performed worse than the unimodal model despite additional clinical information.

Interpretation:

Limitations:

Images were retrospectively collected and curated for education rather than clinical prevalence.
The benign-to-malignant ratio differed from routine practice.
Darker skin phototypes were underrepresented, and combined physician-AI decision-making was not evaluated.