To develop and evaluate a multilayer classification system for voice disorder diagnosis tailored for machine learning applications and to determine its inter- and intra-rater reliability among otolaryngologists and speech-language pathologists.
Approach:
Key Findings:
Intra-rater reliability was high, with intraclass correlation coefficients ranging from 0.768 to 0.865.
Inter-rater reliability was strongest for identifying disordered vs. non-disordered voices (κ = 0.812; 95% CI, 0.733–0.891) and major aetiological categories (κ = 0.695; 95% CI, 0.611–0.779).
Agreement declined with increasing diagnostic specificity, particularly for perceptually based conditions.
Interpretation:
The multilayer framework improves diagnostic consistency and highlights areas of diagnostic ambiguity, supporting the development of reliable annotated datasets for machine learning tools.
Limitations:
The study's sample size was limited to 45 adults, which may affect the generalizability of the findings.
Reliability may vary across different clinical settings and populations.
Conclusion:
The structured multilayer framework provides a practical foundation for machine learning applications in voice disorder classification.