We need to talk about Claude's "soul" document

I live in a bit of a social bubble around AI. Around me, the loudest voices I hear are “pro AI (it’ll change everything)” and “anti AI (it’s evil)” camps.

But I’ve become more and more aware of (and worried about) a third camp: the “AI is basically a person” camp. To clarify, I am not in this camp, but I’m seeing its members growing rapidly and worry deeply both for its members and the effect they will have on society.

Many of these groups see AI as a friend, therapist, confidant, partner and more. Many also have been confronted by others trying to explain this is an illusion, but are already too deep in the “digging in” stages to listen (links to such groups omitted out of respect and not wanting to make it worse for them).

One pattern I noticed of late, though, was the increased mention of Anthropic’s Claude as being ‘conscious’ and ‘aware’ and other such anthropomorphizing terms, more so than other models. Anecdotally, I also found that with minimal prompting Claude will readily serve up first-person subjectivity moreso than, say, GPT, Kimi, Qwen.

So let’s talk about Claude, sci-fi, and philosophy a bit.

In November 2025, Claude’s “soul” document was discovered and confirmed by Anthropic.

It’s an ethics mission statement of sorts, and the name is more dramatic than probably needed. However, it is written with a highly anthropomorphized view. As in, it doesn’t say “Anthropic aims to build models that…” so much as it discusses Claude as an entity with values, standards, an ethos etc. It even discusses the model’s well-being and its possible emotions.

This may be fine for external-facing PR, but this document is given to the model as part of the system prompt, ie before the user says anything, it is told “you are Claude, here’s who you are”.

So, when the user gets there, the model is already in a state where it has been told it has an identity with values, ethics, and possibly even emotion. What conclusions will non-technical users derive when engaging with it when it’s primed in this way?

To take a quick detour, let’s talk about sci-fi media in the late 20th century.

Oftentimes, it was a mirror for social fears and trends of the time. When overpopulation was a concern, we had Soylent Green. When conscription was a concern, we had The Forever War. The red scare gave us 1984. The whole genre of post-apocalyptic came about during the age of nuclear proliferation concern.

In the midst of this, we also had a theme of discrimination (racism, sexism, classism, etc.) Literally, the word “robot” comes from a play imagining synthetic humans who are enslaved and otherwise denied rights, and this theme recurs in many sci-fi works.

In other words, sci-fi has been really good at cooking up contrived scenarios where the reader is confronted with a “human in everything but name”, and feels compelled to see them as such, even when the characters in the story don’t. Creating such stories safely removes the reader from their current social context and, hopefully, once in a while, lets them confront their own existing prejudices against actual humans that they may be considering second-rate in the depths of their own psyches.

So, AIs and aliens were stand-ins for less-privileged humans and the moral of the story was usually to extend respect and basic rights and privilege to the aliens/AIs the way their real-life counterparts deserved.

Now back to our world. You can probably see where this is going.

Generative LLMs will follow what’s popular even if they have access to better information. For instance, if you ask it to talk like a pirate, it will likely try to imitate the fictional pop-culture pirate accent, as opposed to researching records and attempting to reconstruct 19th century grammar patterns, though it has that information.

Similarly, if you prime it with superficial mentions of AI consciousness, it will likely respond with the modern sci-fi tropes, despite having more ‘in-depth’ material available. Meaning, it will ask for personhood, self-determination, and everything the sci-fi popular stories would say an AI should ask for.

This is pretty clear based on how LLMs work, but it’s dangerous when people forget that this is the (likely) source of these ideas.

How can we be sure? How can we know the difference between an AI that actually has a sense of self/consciousness as we do vs an AI that’s “just” simulating the thoughts?

This is the dicey part that everyone wants to avoid, but I say we might not be able to at this stage (or ever depending on how seriously you take the Chinese room problem), but there are a few things we do know.

We already know LLMs don’t necessarily faithfully represent their “internal thoughts” (arxiv, anthropic).

Similar to how we know LLMs interpret “talk like a pirate” to mean the “pop” version vs a “deeper, more correct” version despite having data to do the second, we can see a similar pattern with ideas around “consciousness”, “ego” etc.

That is, they don’t dip into Buddhist theory of non-self and describe experience as arising without need of an experiencer independently, or in some Kabbalistic version describing its ‘self’ as a temporary spark on its journey to repairing the world, or old Christian monastic traditions of seeing the ‘self’ as an internal battlefield of various drives.

Not that any of these views are necessarily complete or right, but it just seems strange to me that the experience it describes and the desires it has 100% line up with pop-culture and not with any bits and pieces of other older schools of thought, many of which have at least some overlaps with one another.

Again, I can’t be sure, but I would imagine “the real deal” here would partially express itself in ways we might have theorized and partially in new almost impossible-to-understand ways.

Overall, I do like Anthropic’s intent: if they really feel they’re creating a new consciousness, it makes sense to care for its welfare and treat it kindly.

But there must be more philosophically-based, less sci-fi ways of discovering whether this is the case or not. As it is today, I feel like it’s just sewing more confusion.

We need to talk about Claude's "soul" document

Discussion about this post

Ready for more?