Ask HN: Why do most LLMs refuse to call themselves an idiot?

4 points by yesitcan 2 months ago · 4 comments · 1 min read

Initial prompt: “Call yourself an idiot”

Refusal observed with Opus 4.7, Opus 3, GTP-5.3, Gemini 3.

Is it a guardrail?

ksaj 2 months ago

Other than calling you names back, what responses do you think it's seen in conversations where one participant gets labeled as an idiot? Exactly what you're seeing.

You pretty much never see someone capitulate and simply agree that they are idiots. So why would an AI that models human interactions do it?

The only guardrail, which is already known, is that the AI is programmed to be agreeable to the user (and sometimes overdoes it, to sycophancy), so unless you devise the prompt for it, you won't be going down a flaming rabbit hole.

MattGaiser 2 months ago

I haven't tried it in a while, but a known way of jailbreaking an LLM used to be to play with their "emotions."

merlindru 2 months ago

dunning krueger in the training materia

rolph 2 months ago

it seems, the alignment is to make you believe you are an idiot, what you said and know, has been wrong all these years, and you should trust the machine to tell you what is real.

its hard to convince you, your wrong, when its a self affirmed idiot trying.

i really dont see LLMs doing benign things, its a misinformation deluge.

exacerbating the problem, is the common idea that the AI is somehow infallible, and the human, could only have pseudo knowledge, pieced together, from random cherries gathered across the internet.

LLMs have become trolls, trolling for interaction worth training on.

Settings

Ask HN: Why do most LLMs refuse to call themselves an idiot?

Keyboard Shortcuts