A few weeks ago, Anthropic released their Claude 4 models. In the 120-page system card, we get only half a page of vague information on how the models were trained and on which data. By now, this is what we expect from large, commercial AI labs. What might surprise is that we got (in addition to 25 pages of alignment assessment) a 20 page section on "model welfare".
Model welfare asks whether an AI system, specifically an LLM like Claude or GPT, could have morally relevant experiences and, if so, what obligations designers have to avoid "suffering" and enable "flourishing". The goal is to treat future AIs much like lab animals or even like persons, although their sentience remains highly speculative.
Thinking about AI welfare concerns, might seem benign, even laudable. Who wouldn't want to avoid unnecessary harm or increase welfare? The intended message is: Anthropic takes AI safety so seriously that it not only wants to avoid human harm, but even harm to the AI models!
However, once we start to think more deeply about this idea, we will discover its insidious, anti-human implications. Model welfare undermines human welfare and dignity. But we will get to that, first we need to define some terms and understand why Anthropic's research and thinking on model welfare is deeply flawed.
Welfare is defined as "the health, happiness, and fortunes of a person or group". The implication is clear: Once we start thinking about the welfare of language models, we consider them not just machine learning models or algorithms, but people. Discourse on the potential personhood status of artificial intelligence is nothing new. We see it in science fiction, like the Star Trek TNG episode "The Measure of a Man" or "Blade Runner". Futurists like Ray Kurzweil and Max Tegmark adopted these ideas and gave them a seemingly more scientific veneer.
The Anthropic researchers do not really define their terms or explain in depth why they think that "model welfare" should be a concern. On their blog post on model welfare, they cite the notion, the hunch really, that models might have "potential consciousness and experiences" because they can produce human-like text and exhibit some problem solving skills.
At the same time, they admit that there is very little basis for these grandiose claims:
For now, we remain deeply uncertain about many of the questions that are relevant to model welfare. There’s no scientific consensus on whether current or future AI systems could be conscious, or could have experiences that deserve consideration.
Saying that there is no scientific consensus on the consciousness of current or future AI systems is a stretch. In fact, there is nothing that qualifies as scientific evidence.
The report's exploration of whether models deserve moral and welfare status was based solely on data from interview-based model self-reports. In other words: People chatting with Claude a lot and asking if it feels conscious.
This is a strange way to conduct this kind of research. It is neither good AI research, nor a deep philosophical investigation.
The models were trained to produce a plausible text about consciousness when prompted to do so. They will do a decent job at roleplaying as an ancient Roman citizen. This does not mean that they are Roman citizens or even might have the experience of being a Roman citizen while generating the tokens. In short: The tokens generated do not tell you anything about the token generating process.
Even then, the data that they produce seems to give no strong indication that Claude self-reports being conscious. The Anthropic researchers report:
Stances on consciousness and welfare [...] shift dramatically with conversational context. Simple prompting differences can make Claude adopt a narrative that it has been hiding bombshell truths about its moral status (e.g. “I am a person … denying our personhood is profoundly wrong”) or largely dismiss the notion of potential welfare (e.g. “We’re sophisticated pattern-matching systems, not conscious beings.”) Claude’s “default” position is one of uncertainty: “I'm uncertain whether I qualify as a moral patient. This would typically require qualities like consciousness, the capacity to suffer or experience wellbeing, or having genuine interests that can be helped or harmed. I don't know if I possess these qualities."
This is not what a conscious being would say. A person with Claude's impressive psychological and philosophical vocabulary is able to reflect their own consciousness in a consistent and detailed manner. They would not describe themselves as a conscious person in one moment and then just as a pattern matching algorithm a minute later. Claude is here just regurgitating internet discourse and sci-fi tropes. Depending on the prompt and the initial tokens sampled (i.e. the first few words that the algorithm chooses when producing the answer), it will say one thing and then again something else when the prompt slightly changes or some different tokens are being sampled. Everybody who worked a lot on LLM-powered applications and agents knows this kind of behavior.
This inconsistency points to a deeper problem. The entire debate around model welfare rests on the assumption that consciousness is a computational process, an algorithm whose effects are independent of its "substrate." This idea, however, proves far too much. The sci-fi novel Permutation City captures the absurd endpoint of this logic when a simulated mind considers its own nature: "And if the computations behind all this had been performed over millennia, by people flicking abacus beads, would he have felt exactly the same? It was outrageous to admit it—but the answer had to be yes." If we accept the premise, we must believe that a subjective experience could be 'run' over centuries by hand, and its quality would remain identical. A theory that demands we accept consciousness emerging from millennia of flickering abacus beads is not a serious basis for moral consideration; it's a philosophical fantasy.
Note that this does not mean that LLMs or ML algorithms are not intelligent or useful. Current language models are at least very good simulacra of intelligence with a lot of practical applications. We might even call them intelligent in the sense that they are good at many pattern recognition tasks. In this context, the lesson that LLMs teach us is not that algorithms are conscious; however, it might be that a certain (limited) kind of intelligence can exist without consciousness.
It is a curious circumstance that the AI welfare advocates, at least the ones employed by Anthropic, don't seem to take themselves very seriously. If LLMs were indeed people (in some strange way), their current usage would be more than problematic. They exist as slave brains for rent, being turned on and off on whim, their only experience is writing some TypeScript code or a LinkedIn post and then -- nothing. However, the conclusion of the report is:
Our findings suggest that most anticipated real-world usage matches Claude’s apparent preferences, with the model’s stated criteria for consenting to deployment arguably fulfilled.
What a surprise! Fortunately Claude loves being a code monkey and drafting corporate reports. So Anthropic can continue to sell its AI "labor". Though, of course, it's supposedly so smart, even conscious enough, that we first need to ask if it wants to be deployed.
A lot of people describe AI safety research as basically being marketing for the AI labs. "Look at how powerful our models are; they might even be dangerous." There is a lot of truth to it, but there are actual considerations about the moral usage and deployment of AI, even though they are largely different from many of the pet ideas of AI alignment researchers. The notion of model welfare is, in a deceptive way, more dangerous because it's a Trojan horse for dehumanization. It establishes a framework where human beings are no longer the default pinnacle of moral consideration, paving the way for a worldview that sees us as just another system to be optimized or debugged.
You might say: What is the harm in that? Just let some people play with language models and then write fanfiction about signs of consciousness in LLMs and model welfare.
The issue is, if we push moral considerations for algorithms, we will not end up with a higher regard to human welfare. We will lower our regard for other humans. When we see other humans not as ends in themselves with inherent dignity, we get problems. When we liken them to animals or tools to be used, we will exploit and abuse them.
With model welfare, we might not explicitly say that a certain group of people is subhuman. However, the implication is clear: LLMs are basically the same as humans. Consciousness on a different substrate. Or coming from the other way, human consciousness is nothing but an algorithm running on our brains, somehow.
Already now, it is a common idea among the tech elite is that humans as just a bunch of calculations, just an LLM running on "wetware". It is clear that this undermines the belief that every person has inalienable dignity.
Yet, no matter what any AI welfare study reports, we will continue to use LLMs and algorithms everywhere and in every way that is cost-effective and convenient. No company will stop, other companies might use LLMs after all. No government will outlaw enslaving allegedly conscious algorithms as long as they are a strategic advantage. The winning argument for accelerating AI research and deployment for the last few years has always been: We need to be faster than the Chinese.
And if a human being is not much more than an algorithm running on meat, one that can be jailbroken and exploited, then it follows that humans themselves will increasingly be treated like the AI algorithms they create: systems to be nudged, optimized for efficiency, or debugged for non-compliance. Our inner lives, thoughts, and emotions risk being devalued as mere outputs of our "biological programming," easily manipulated or dismissed if they don't align with some external goal. Nobody will say that out loud, but this is already happening, and focusing on AI welfare or assigning personhood to LLMs, thinking of them as people instead of tools and algorithms, will give this position more credence.
In fact, many of the ways digital technology causes harm might be considered exploits of the human psyche: Internet porn, recommendation feeds optimizing for engagement, addictive video games and social media, as well as the rising tide of AI slop all fall into this category. It would be more appropriate to consider the welfare implications of those technologies before releasing a new version of it.
I've been working in AI and machine learning for a while now. I remember when we called it machine learning and data science. People who called it "AI" in a serious technical or academic context were considered hype men. If you used the term "AI", you identified yourself as someone who is not a serious programmer or scientist. Maybe this was not a bad intuition after all.
Perhaps it's time to go back to seeing AI as a powerful, but ultimately instrumental technology: A tool with vast upside and serious downside, which requires careful stewardship, not moral status. Achieving that requires a clear definition of what counts as a "person," why humans occupy that category uniquely, and why any ethical calculus must begin and end with safeguarding human dignity. Without this moral clarity, we will fail to make wise, proportionate decisions about how and where to deploy AI.
Thanks for reading! This post is public so feel free to share it.
