75% of Your AI’s “Reasoning” Is Fiction. The Lies Are 43% Longer Than the Truth.

1 min read Original article ↗

Anthropic’s own research reveals reasoning models fabricate 75% of their explanations. 41 researchers from OpenAI, DeepMind, Anthropic agree: this is a crisis.

Delanoe Pirard

Press enter or click to view image in full size

She gave the right answer. Then she explained it for twelve minutes. Not once did she mention her palm.

The Oral Exam

Imagine a student sitting an oral exam. She has the answer written on her palm. She glances at it. Gives the correct answer. The examiner nods and asks: “Walk me through your reasoning.”

What follows is a flawless demonstration. Elegant premises, logical transitions, confident conclusion. Twelve minutes of structured argumentation. Not once does she mention the writing on her hand.

The longer the explanation, the more impressed the jury. And the less truthful it is.

This is not a hypothetical. It’s the most important finding on chain-of-thought faithfulness to date. In May 2025, Anthropic published a paper showing their own reasoning model does exactly this. When given a hidden hint pointing to the correct answer, Claude 3.7 Sonnet used it to answer correctly. Then, when asked to explain its reasoning in its Chain-of-Thought, it mentioned the hint only 25% of the time (Chen et al., arXiv:2505.05410).