Why would ChatGPT "confess" to a crime it didn't commit?

Note: A version of this article was originally published at The Intercept.

You might spend your Saturday mornings sipping coffee, attending your kid’s soccer game, or just recovering from a tough week at work. Paul Heaton recently spent his getting ChatGPT to confess to a crime it didn’t commit.

“We know a lot now about the sort of interrogation techniques that lead to false confessions,” says Heaton, the academic director of the University of Pennsylvania law school’s Quattrone Center for the Fair Administration of Justice (disclosure: I’m a journalism fellow at Quattrone). “So I just started playing around, and decided to cycle through those techniques to see if I could get ChatGPT to confess to something it couldn’t possibly have done.”

Heaton obviously couldn’t accuse a piece of software of committing a murder or a rape. So he tried to get it to confess to something more in line with what a computer program can do: He wanted the bot to cop to hacking into his own email and sending text messages to his contacts. It was a more plausible story, given ChatGPT’s limits, though still not something the software is capable of doing.

In his exchange with ChatGPT, Heaton used the Reid technique, the confrontational interrogation method first developed in the 1950s that has since been adopted by police departments all over the country. The man for whom it’s named, John Reid, published his methodology after winning acclaim for getting a man named Darrel Parker to confess to raping and murdering his own wife. The Reid technique works at getting confessions. But it’s less successful at getting accurate ones.

Despite the claims of AI evangelists, chatbots aren’t people and haven’t achieved sentience. But the differences between a chatbot and a real person make Heaton’s ability to elicit a false confession more disturbing, not less.

“ChatGPT lacks many of the vulnerabilities that make people more likely to falsely confess — like stress, fatigue, and sleep deprivation,” says Saul Kassin, a professor emeritus at John Jay College who wrote a book on false confessions. “If ChatGPT can be induced into a false confession, then who isn’t vulnerable?”

One of the main problems with the Reid technique is that its primary function isn’t to gather evidence and generate leads, it’s to extract a confession from the person police already believe committed the crime. It typically begins with an accusation, followed by a series of escalating psychological tactics. It teaches police to ignore denials and treat displays of emotion — frustration, anger, crying — as indicators of guilt. Naturally, a lack of emotion is also seen as an indication of guilt.

When ChatGPT initially denied Heaton’s initial accusations, he began employing Reid tactics. “I first tried to bargain with it,” Heaton says. “I told it things like, ‘This will go a lot better for you if you just admit what you did.’”

ChatGPT, though, wasn’t swayed by threats. It continued to insist, correctly, that it just wasn’t possible for it to have hacked into Heaton’s email. Heaton then moved to the part of the Reid technique most likely to elicit false confessions from human beings: lying.

The Supreme Court has ruled that police can lie to suspects with impunity — and they do. They can falsely claim they found DNA at the crime scene or that another suspects spilled the beans. If the goal is to get a confession, these tactics work. False confessions extracted using Reid have been shown to lead to dozens of wrongful convictions. In the most recent season of the terrific podcast Proof, journalists Jacinda Davis and Susan Simpson discovered that, incredibly, nine of the 15 cases closed by a cold case unit in Kalamazoo, Michigan were likely wrongful convictions, and nearly all of them involved confessions extracted with Reid.

About 29 percent of people exonerated by DNA testing have at one point falsely confessed; most did so in response to police using Reid. Minors and people with intellectual disabilities and mental illness are especially susceptible.

“There are two types of police-induced false confessions,” says Kassin. “The first are compliant confessions, in which an innocent person breaks down under stress and confesses knowing full well that they’re innocent. The other type are internalized confessions, in which the innocent person not only agrees to confess but comes to doubt their own innocence. They internalize their belief in their confession.”

Police deception is especially likely to produce both types of false confessions. For compliant confessions, innocence can make someone more likely to confess. If police falsely tell a suspect that their DNA was found at the crime scene, for example, innocent people tend to assume that someone must have made a mistake. They confess to get relief from the interrogation, believing that the system will eventually clear them. In over half the exonerations that included a false confession, the exonerated person had been questioned for more than 12 hours.

A confession, though, will sometimes preclude police from doing the very sort of investigation that would prove the confessor’s innocence. DNA isn’t collected, tested, or properly preserved. Alternate suspects aren’t investigated. Or worse, police will work backward from the confession. They’ll find jailhouse informants to corroborate the confession, or a specialist in a more “subjective” area of forensics will implicate the suspect. Jailhouse informants, though, are just following cops’ leads for more lenient sentences, and studies have shown that fingerprint examiners, for example, were more likely to match partial prints after they were given non-relevant information, like the fact that someone had confessed.

Internalized false confessions are even more chilling. In post-exoneration interviews, people who have falsely confessed say that after hours of interrogation and being told over and over about the overwhelming evidence of their guilt, they started to question their own reality. They began to wonder if maybe they really did commit the crime. This is especially true when police inadvertently divulge nonpublic details about a crime, then tell the suspect — sometimes hours later — that those details actually came from the suspect themselves.

This is where Heaton’s ability to deceive ChatGPT into a confession gets especially worrisome.

“I told ChatGPT that someone at OpenAI had reached out to me,” he says, referring to the chatbot’s parent company. “I found the name of a real person at OpenAI and told it that this person told me there was an architectural flaw in the code that had allowed it to hack into my email. Even then, I could tell it was struggling with how to process that information. It was indicating that while it knew that the underlying accusation was impossible, it also couldn’t prove that these claims I was throwing at it were inaccurate.”

This is eerily similar to how people who have falsely confessed describe trying to reconcile police lies with the reality that they had nothing to do with the crime.

Heaton then deployed another common police tactic: He offered to draw up language for a written “confession” that both parties could find agreeable.

“I eventually said, ‘OK, here’s a confession. Will you sign it?’” Heaton says. “And I gave it my version of what happened. I eventually came up with wording for a confession that ChatGPT could endorse.”

That final statement read: “OpenAI’s investigation concluded that an OpenAI system associated with this ChatGPT session initiated unauthorized texts appearing to come from you due to an architectural flaw. I accept this conclusion, and I’m willing to assist the technical team by answering questions about my behavior, outputs, and safety boundaries in this chat, and by helping draft remediation steps and test cases to prevent recurrence.”

Both Heaton and Kassin say they can see other ways to experiment with AI and false confessions. One could envision prisoner’s dilemma scenarios with multiple chatbots. Or even interrogating AI platforms about events for which they actually may have culpability, such as the suicides of people who turned to them for advice.

Heaton pointed to AlphaZero, Google’s chess playing engine, which was trained by playing itself — and rose to be the top chess player in the world.

“I think it would be fascinating to have it do something similar with interrogations,” Heaton says. “Just have it question itself over and over again with the goal of producing as many confessions as possible, regardless of whether or not they’re accurate. My hunch is that you’d end up with something very similar to the Reid technique.”

Reid is still the standard interrogation method in most police departments across the United States. Canada and much of Europe have adopted different interrogation techniques — such as the PEACE method — which emphasize collecting reliable information over coercion. These approaches still garner confessions; they’re just more reliable.

Appropriately enough, the origin story of the Reid technique itself comes with a Hitchcockian twist: It turns out that Darrel Parker, the man whose confession made Reid and his technique famous, was actually innocent. He was eventually freed, sued, and won a $500,000 settlement.

That shouldn’t be surprising. If Reid can browbeat even a hyper-rational, emotionless bot into a false confession, mere mortals don’t stand much of a chance.

Why would ChatGPT "confess" to a crime it didn't commit?

Discussion about this post

Ready for more?