Accuracy of AI Laryngeal Disorder Detection - Summary - MDSpire
Advertisement
Accuracy of AI Laryngeal Disorder Detection
Review of 88 studies found AI systems achieved high accuracy for identifying abnormal voices, but performance declined among higher-level laryngeal disorder classifications.
To evaluate the accuracy of artificial intelligence systems in detecting laryngeal disorders.
Approach:
Key Findings:
AI models performed best in binary classification tasks, with accuracies ranging from 88% to 99% for distinguishing healthy from pathologic voices.
Performance declined to approximately 70% to 90% for broader pathophysiologic categories and generally remained below 75% for specific disorders.
AI performance varied by model architecture and data type, with traditional machine-learning achieving 88% to 96% accuracy for binary tasks and deep-learning systems achieving 97% to 99% on standardized datasets.
Most studies relied on internal validation, with performance often declining by 10-20 percentage points on independent cohorts.
Interpretation:
The decline in performance from detection to diagnosis is attributed to acoustic overlap among laryngeal disorders, where distinct diseases can produce similar voice abnormalities.
Limitations:
Many studies had methodological concerns, including dependence on limited databases, class imbalance, and lack of demographic diversity.
Approximately 82% of studies used sustained-vowel tasks, which may not capture clinically relevant vocal variability.
Fewer than 15% of studies shared source code or complete model documentation, limiting reproducibility.
Conclusion:
Current evidence supports AI primarily as a tool for screening, triage, and decision support rather than as an autonomous diagnostic system.