What I found reading Claude's leaked 57K-word system prompts
I use Claude Code every day as my primary coding tool — it runs my entire workflow, from building products to automating tasks on a VPS.
When I found this repo (credit to @NotLucknite), I spent a full day reading through the Anthropic folder. The thing that surprised me most wasn't any single rule — it was how opinionated the prompts are. They explicitly tell Claude to disagree with you, never compliment your code, and answer "4" when you ask "what is 2+2."
The security section for the Chrome extension is genuinely impressive — a full injection defense protocol that treats every web page as potentially hostile. I haven't seen anything this thorough from any other AI company.
Happy to answer questions about what I found or how Claude Code works in practice. Usually these are just convincing hallucinations. There is no reliable way for an LLM to introspect its system prompt. People are actively trying to figure this out, and I’ve seen—and tested—some reliable approaches from their articles. Interesting that they have "IMPORTANT: Assist with defensive security tasks only." twice, once as the very first instruction after telling Claude what it is, and once toward the end. Anthropic clearly treats it as the highest priority constraint.