Concordance among experts in assessing apical mucosal preservation during holmium laser enucleation of the prostate (HoLEP): implications for artificial intelligence model development - Report - MDSpire

Concordance among experts in assessing apical mucosal preservation during holmium laser enucleation of the prostate (HoLEP): implications for artificial intelligence model development

  • By

  • Archan Khandekar

  • Aravindh Rathinam

  • Ansh Bhatia

  • Diana M. Lopategui

  • Jonathan Katz

  • Roger L. Sur

  • Nicholas Smith

  • Pankaj N. Maheshwari

  • Hemendra N. Shah

  • December 4, 2025

  • 0 min

Share

Expert Agreement on Apical Mucosal Preservation Evaluation in HoLEP Videos

Overview

This study assessed interrater reliability among six expert urologists rating apical mucosal preservation in HoLEP surgical videos and explored associations with postoperative urinary continence. Moderate agreement was observed, supporting the feasibility of using expert-labeled data to train AI models for intraoperative assessment.

Background

Holmium laser enucleation of the prostate (HoLEP) is a standard surgical treatment for benign prostatic hyperplasia, but transient stress urinary incontinence (TSUI) remains a common early postoperative complication. Technical refinements such as improved apical mucosal preservation have reduced TSUI rates. Artificial intelligence (AI) offers potential to automate intraoperative assessments, but requires reliable expert annotations to train models accurately. This study evaluates expert concordance in classifying apical mucosal preservation from surgical videos as a foundation for AI development.

Data Highlights

MetricValue
Number of videos reviewed60
Number of expert raters6
Rating scale3-tier (completely preserved, partially preserved, not preserved)
Postoperative continence follow-up6 weeks
Interrater agreement measureFleiss’ multi-rater Kappa (κ)

Key Findings

  • Six expert urologists independently rated apical mucosal preservation on a 3-level scale from anonymized HoLEP videos.
  • Interrater reliability showed moderate agreement, with Fleiss’ multi-rater Kappa indicating acceptable concordance for clinical annotation.
  • Ratings were correlated with patient-reported urinary continence outcomes at 6 weeks postoperatively.
  • Majority vote consensus ratings were used to define ground truth for potential AI model training.
  • No formal training or consensus meetings were conducted prior to rating, reflecting natural variability in expert judgment.

Clinical Implications

The moderate interrater agreement supports the feasibility of developing AI algorithms to assess apical mucosal preservation intraoperatively during HoLEP. Reliable expert annotations are critical to train models that can predict postoperative continence outcomes and potentially guide surgical technique or postoperative management. Incorporating standardized rating criteria and consensus-building may further improve annotation quality for AI applications.

Conclusion

Expert urologists demonstrate moderate concordance in visually classifying apical mucosal preservation in HoLEP videos, providing a viable foundation for supervised AI model development. This approach may enhance intraoperative assessment and prognostication of urinary continence outcomes.

References

  1. 1 -- Holmium laser enucleation of the prostate (HoLEP) as treatment for BPH
  2. 2 -- Transient stress urinary incontinence rates post-HoLEP
  3. 3,4,5 -- Technical modifications reducing TSUI rates
  4. 6 -- Randomized trial comparing en-bloc and lobe-by-lobe HoLEP techniques
  5. 7,8 -- AI applications in surgical step recognition
  6. 9,10 -- AI in urology: surgical phase recognition and Gleason grading
  7. 11 -- Importance of reliable expert labeling for AI model accuracy
  8. 12 -- Standardized en-bloc HoLEP surgical technique
  9. 13 -- Interpretation benchmarks for Cohen’s Kappa
  10. 14 -- Fleiss’ multi-rater Kappa for interrater reliability

Original Source(s)

Related Content