Using GPT-4 to annotate the severity of all phenotypic abnormalities within the human phenotype ontology

By
Kitty B. Murphy
Brian M. Schilder
Nathan G. Skene
May 21, 2026
0 min

Frontiers In Digital Health

Objective:

To automate the annotation of clinical severity for phenotypic abnormalities in the Human Phenotype Ontology (HPO) using GPT-4, enhancing the efficiency of clinical metadata curation.

Key Findings:

GPT-4 achieved true positive recall rates between 89% and 100% (mean = 97%) in severity annotation, indicating a high level of accuracy in replicating expert-level curation.
The severity scoring system integrates both the type and frequency of clinical characteristics, providing a comprehensive metric for assessing phenotypic severity.

Interpretation:

The study demonstrates that LLMs like GPT-4 can effectively automate the curation of clinical metadata, significantly reducing the need for manual expert annotation and improving efficiency in clinical settings.

Limitations:

The study primarily focused on phenotypic severity without exploring other clinical dimensions, which may limit the applicability of the findings.
Validation against independent clinical datasets is needed for further confirmation of the annotations, ensuring robustness and reliability.

Conclusion:

The findings provide a foundation for systematically ranking human phenotypes by their health impact, aiding in therapeutic prioritization for rare diseases and enhancing clinical decision-making.

Using GPT-4 to annotate the severity of all phenotypic abnormalities within the human phenotype ontology

Objective:

Key Findings:

Interpretation:

Limitations:

Conclusion:

Original Source(s)

Using GPT-4 to annotate the severity of all phenotypic abnormalities within the human phenotype ontology

Related Content

New Collaboration to Automate Cell Therapy Manufacturing with Robotics

Feasibility and implementation of a daily safety brief at a children's hospital-in-a-hospital

Unlocking Hidden RNA Signals