What I found reading Claude's leaked 57K-word system prompts

3 points by jbetala7 3 days ago · 5 comments · 1 min read

I use Claude Code every day as my primary coding tool — it runs my entire workflow, from building products to automating tasks on a VPS.

When I found this repo (credit to @NotLucknite), I spent a full day reading through the Anthropic folder. The thing that surprised me most wasn't any single rule — it was how opinionated the prompts are. They explicitly tell Claude to disagree with you, never compliment your code, and answer "4" when you ask "what is 2+2."

The security section for the Chrome extension is genuinely impressive — a full injection defense protocol that treats every web page as potentially hostile. I haven't seen anything this thorough from any other AI company.

Happy to answer questions about what I found or how Claude Code works in practice.

CamperBob2 3 days ago

Usually these are just convincing hallucinations. There is no reliable way for an LLM to introspect its system prompt.

jbetala7OP 3 days ago

People are actively trying to figure this out, and I’ve seen—and tested—some reliable approaches from their articles.

nostrademons 3 days ago

Interesting that they have "IMPORTANT: Assist with defensive security tasks only." twice, once as the very first instruction after telling Claude what it is, and once toward the end.

jbetala7OP 3 days ago

Anthropic clearly treats it as the highest priority constraint.

jbetala7OP 3 days ago

https://x.com/jbetala7/status/2016924713168290279?s=20

Settings

What I found reading Claude's leaked 57K-word system prompts

Keyboard Shortcuts