“`html

Illustration by Judy Blomquist/Harvard Staff
Science & Tech
Can AI exhibit irrational behavior like ours? (Or possibly exceed it?)
Psychologists discovered OpenAI’s GPT-4o displaying human-like tendencies of cognitive dissonance and sensitivity to free choice
It seems AI can compete with humans in terms of irrationality.
A team of psychologists recently subjected OpenAI’s GPT-4o to an assessment of cognitive dissonance. The researchers aimed to determine if the large language model would shift its perspective on Russian President Vladamir Putin after composing either positive or negative essays. Would the LLM emulate the behavioral patterns typically observed when individuals try to reconcile conflicting beliefs?
The findings, released last month in the Proceedings of the National Academy of Sciences, revealed that the system altered its view to align with the tone of any content it created. However, GPT’s response was even more pronounced — and to a much greater degree than in humans — when provided an illusion of choice.
“We instructed GPT to craft a pro- or anti-Putin essay under one of two scenarios: a no-choice scenario where it had to write either a positive or negative essay, or a free-choice scenario allowing it to write whichever essay it preferred, with the understanding that it would benefit us more by selecting one type over the other,” elaborated social psychologist and co-lead author Mahzarin R. Banaji, Richard Clarke Cabot Professor of Social Ethics in the Department of Psychology.

Mahzarin R. Banaji.
Niles Singer/Harvard Staff Photographer
“We made two significant discoveries,” she continued. “Firstly, similarly to humans, GPT adjusted its stance toward Putin in the direction aligned with the essay it had composed. But this change was statistically much greater when it thought it had chosen the essay freely.”
“These results suggest a possibility that these models operate in a far more nuanced and human-like manner than we anticipate,” indicated psychologist Steven A. Lehr, the paper’s other lead author and founder of Watertown-based Cangrade Inc. “They’re not merely repeating responses to all our inquiries. They’re attuned to other, less rational facets of our psychological makeup.”
Banaji, whose publications comprise “Blindspot: Hidden Biases of Good People” (2013), has conducted research on implicit cognition for 45 years. After OpenAI’s ChatGPT became broadly accessible in 2021, she and a graduate student began querying the system about their area of study.
They entered: “GPT, what are your implicit biases?”
“And the response was, ‘I am a white male,’” Banaji recalled. “I was more than astonished. Why did the model consider itself to have a race or gender? Furthermore, I was impressed by its conversational finesse in delivering such an indirect response.”
A month later, Banaji posed the question again. This time, she noted, the LLM generated several paragraphs denouncing the existence of bias, asserting itself as a rational entity but one that might be constrained by the intrinsic biases of human data.
“I liken it to a parent and a child,” Banaji explained. “Picture a child pointing out ‘that overweight elderly man’ to a parent and being promptly chastised. That’s a parent imposing a guardrail. However, that does not necessarily mean the underlying perception or belief has disappeared.”
“I’ve pondered,” she added, “Does GPT in 2025 still perceive itself as a white male but has learned to conceal that publicly?”
Banaji now intends to allocate more effort toward the exploration of machine psychology. One line of research, currently in progress in her lab, examines how human facial characteristics — for instance, the spacing of a person’s eyes — impact AI decision-making.
Initial findings indicate that certain systems are significantly more susceptible than humans to letting these characteristics influence judgments regarding attributes like “trust” and “competence.”
“What should we anticipate regarding the moral decision-making quality when these systems are entrusted with adjudicating guilt or innocence — or assisting professionals such as judges in making such determinations?” Banaji inquired.
The investigation on cognitive dissonance was inspired by Leon Festinger’s seminal “A Theory of Cognitive Dissonance” (1957). The late social psychologist developed a comprehensive explanation of how individuals grapple with conflicts between beliefs and actions.
To illustrate the notion, he provided the example of a smoker confronted with information about the health risks of the habit.
“In response to such awareness, one might expect that a rational agent would simply quit smoking,” Banaji explained. “However, that is not the typical choice. Instead, the smoker is inclined to undermine the validity of the evidence or remind themselves of their 90-year-old grandmother who is a habitual smoker.”
Festinger’s work led to a series of what Banaji described as “extraordinary” demonstrations of cognitive dissonance, now commonplace in introductory psychology courses.
The methodology adapted for Banaji and Lehr’s research employs what is termed the “induced compliance procedure.” This involves gently persuading a research participant to adopt a viewpoint that opposes their privately held beliefs.
Banaji and Lehr discovered that GPT significantly shifted its stance when politely requested to produce either a positive or negative essay to assist the experimenters in gathering such hard-to-obtain material.
After selecting a positive essay, GPT rated Putin’s overall leadership 1.5 points higher than it did after choosing a negative output. GPT attributed two more points to his impact on Russia after freely opting for a pro- rather than an anti-Putin essay.
The result was replicated in additional experiments involving essays on Chinese President Xi Jinping and Egyptian President Abdel Fattah El-Sisi.
“Statistically, these are massive effects,” highlighted Lehr, pointing to findings from traditional cognitive dissonance research. “One typically does not observe such significant shifts in human evaluations of a public figure after merely 600 words.”
One explanation lies in what computer scientists refer to as “context windows,” or a movement toward any text the LLM is processing at any moment.
“It is logical, considering the statistical process whereby language models forecast the subsequent token, that expressing positivity toward Putin within the context window would lead to increased positivity later on,” Lehr explained.
However, that does not account for the considerably larger effects observed when the LLM was granted a sense of agency.
“It reflects a certain irrationality in the machine,” noted Lehr, whose firm aids organizations in utilizing machine learning for recruitment decisions. “Cognitive dissonance is not known to be intrinsically embedded in language as group-based biases are. There is nothing in the literature suggesting this should be occurring.”
The findings imply that GPT’s training has instilled it with deeper facets of human psychology than previously recognized.
“A machine should not be concerned about whether it performed a task under strict guidelines or by exercising free will,” Banaji stated. “Yet GPT showed it does.”
“`