Using GPT-4 to annotate the severity of all phenotypic abnormalities within the human phenotype ontology - Summary - MDSpire

Using GPT-4 to annotate the severity of all phenotypic abnormalities within the human phenotype ontology

  • By

  • Kitty B. Murphy

  • Brian M. Schilder

  • Nathan G. Skene

  • May 21, 2026

  • 0 min

Share

Objective:

To automate the annotation of clinical severity for phenotypic abnormalities in the Human Phenotype Ontology (HPO) using GPT-4, enhancing the efficiency of clinical metadata curation.

Key Findings:
  • GPT-4 achieved true positive recall rates between 89% and 100% (mean = 97%) in severity annotation, indicating a high level of accuracy in replicating expert-level curation.
  • The severity scoring system integrates both the type and frequency of clinical characteristics, providing a comprehensive metric for assessing phenotypic severity.
Interpretation:

The study demonstrates that LLMs like GPT-4 can effectively automate the curation of clinical metadata, significantly reducing the need for manual expert annotation and improving efficiency in clinical settings.

Limitations:
  • The study primarily focused on phenotypic severity without exploring other clinical dimensions, which may limit the applicability of the findings.
  • Validation against independent clinical datasets is needed for further confirmation of the annotations, ensuring robustness and reliability.
Conclusion:

The findings provide a foundation for systematically ranking human phenotypes by their health impact, aiding in therapeutic prioritization for rare diseases and enhancing clinical decision-making.

Original Source(s)

Related Content