Conversational Agents Offer Assistance—But AI-Generated Empathy Has Its Limits
Conversational agents (CAs) like Alexa and Siri are increasingly woven into our daily lives, offering assistance and even attempting to display empathy. However, a recent study conducted by researchers from Cornell University, Olin College, and Stanford University tells us more about the limitations of these CAs when it comes to interpreting and exploring human experiences.
Powered by large language models (LLMs) that process vast amounts of human-generated data, CAs often mirror the biases inherent in the data they are trained on. To investigate this phenomenon, the research team prompted CAs to exhibit empathy while interacting with or discussing 65 different human identities.
The findings, were recently presented at the 2024 Conference on Human Factors in Computing Systems (CHI '24 conference) in Hawaii and reveal that CAs tend to make value judgments about certain identities, displaying biases that can be encouraging of harmful ideologies, including Nazism, while showing less empathy towards marginalized identities such as those identifying as gay or Muslim.
"I think automated empathy could have tremendous impact and huge potential for positive things -- for example, in education or the health care sector," said lead author Andrea Cuadra, who is currently a postdoctoral researcher at Stanford. "It's extremely unlikely that it (automated empathy) won't happen so it's important that as it's happening, we have critical perspectives so that we can be more intentional about mitigating the potential harms."
While LLMs received commendable ratings for emotional reactions, they fared poorly in terms of interpretations and explorations, indicating a lack of depth in their understanding of human experiences. This deficiency raises concerns about the accuracy and appropriateness of their responses, particularly in sensitive or complex situations.
Nicola Dell, Deborah Estrin, and Malte Jung, co-authors of the research, were motivated by observations of the tension between compelling and disturbing displays of empathy in earlier-generation CAs, particularly in interactions involving older adults.
Funding for the research was provided by the National Science Foundation, Cornell Tech Digital Life Initiative Doctoral Fellowship, Stanford PRISM Baker Postdoctoral Fellowship, and the Stanford Institute for Human-Centered Artificial Intelligence.