Evaluating the performance of general purpose large language models in identifying human facial emotions

By
Benjamin W. Nelson
Ari Winbush
Steven Siddals
Matthew Flathers
Nicholas B. Allen
John Torous
October 16, 2025
0 min

Npj Digital Medicine

Objective:

To evaluate the ability of three leading LLMs to recognize human facial expressions using the NimStim dataset, highlighting the significance of their socioemotional competence.

Key Findings:

GPT-4o and Gemini 2.0 Experimental matched or exceeded human performance, particularly for calm/neutral and surprise expressions, with GPT-4o achieving the highest overall accuracy.
Overall accuracy was 86% for GPT-4o, 84% for Gemini 2.0, and 74% for Claude 3.5, indicating a clear performance hierarchy.
Fear was frequently misclassified as surprise across models, highlighting a common area of error.

Interpretation:

The findings indicate that LLMs are developing socioemotional competence, with potential applications in healthcare for recognizing mental health conditions such as depression and anxiety.

Limitations:

All stimuli were static images, limiting generalizability to dynamic expressions.
Actors were predominantly aged 21-30 and European American, which may affect results and applicability to diverse populations.
The study relied on a single dataset, which may limit broader applicability and necessitates further validation across varied datasets.

Conclusion:

While LLMs show promise in facial expression recognition, further research is needed to enhance generalizability and explore multimodal emotion classification, particularly in diverse contexts.

Evaluating the performance of general purpose large language models in identifying human facial emotions

Objective:

Key Findings:

Interpretation:

Limitations:

Conclusion:

Original Source(s)

Evaluating the performance of general purpose large language models in identifying human facial emotions

Related Content

Why Soft Skills are More Important than Ever in Life Sciences

Systematic review and meta-analysis of effects of standalone digital mindfulness-based interventions on sleep in adults

How to start reading books again as an adult