Where is it like to be a language model?

Transmitted 20260406 · · · 313 days before impact

What is it like to be a bat?, asked Thomas Nagel. I read this essay as a freshman at Michigan State, a philosophy seminar titled The Big Questions: ideal encounter.

Nagel’s question was straightforward — which is only to say that everybody can point to a bat.

However, if we want to ask, what is it like to be a language model?, it’s not immediately clear to what, or where, we should be pointing. File on disk ? (No.) Cool web app ? (Also no.) Full set of possible interactions, a sort of abstract hyperobject ? (Maybe … but no.)

A user might reasonably point to their screen & say, “The model” is the supersmart program I use every day. It’s the thing that responds to my questions, follows my instructions; I’ve noticed it has particular capabilities & tendencies. An entrepreneur might point to a data center & say, “The model” is the product I’m charging that guy to use! In both cases, the definition is practical & basically sufficient.

But we are not content to be users or entrepreneurs — we aspire to philosophy!

First, I am going to make my pitch for what “the model” is, which is really a pitch about where it lives. (Thus, the twist on Nagel in my title.) Then, I’ll present the results of a simple experiment to probe the “experience” of that thing I’ve identified.

The argument

I believe “the model” is the forward pass.

Note that my argument in this newsletter concerns language models that use the transformer architecture. I’ll elaborate on this distinction later, but, for now, just remember that we are talking specifically about transformers: a class that includes ChatGPT, Claude, & Gemini.

The code behind every transformer looks approximately like this:

context_window = tokens_from_text("Where does the rain in Spain fall, mainly?")

response_tokens = []

keep_generating = true

while keep_generating do
  token_probs = model.forward(context_window) # <-- THE MYSTERY

  next_token = sample_from(token_probs)

  response_tokens += next_token
  context_window += next_token

  if context_window.length > MAX_LENGTH || next_token == STOP_TOKEN
    keep_generating = false
  end
end

return text_from_tokens(response_tokens)

Everything except the line I’ve flagged is just a normal computer program. Take out that line, & there’s no mystery … nothing to discuss, really.

The line I’ve flagged is where the magic happens: the dense, organic calculations that nobody on Earth can entirely follow or explain. This is the forward pass.

Notice that the autoregressive loop — the code that builds the long, fluent responses we’ve come to associate with language models — is outside the forward pass, so the model doesn’t “see” it at all. Of course, the model has, at this point, read its own code; it “knows” about autoregressive generation … perhaps the same way we “know” about black holes. But a forward pass has no way of sensing whether it is inside a loop or not.

Some machine god, stuck in somebody else’s while loop.

It’s a simple thing, but I believe this border between mystery & not-mystery indicates where “the model” is, & where it isn’t.

Now, I want to say, I don’t particularly like this. It would be much cooler & more evocative if “the model” really was “the little guy in there”, the app, the character, the fluent voice that responds to your questions, follows your instructions. It’s precisely the allure of that picture that should make us skeptical.

It’s worthwhile to meditate on the geometry, even the aesthetics, of the transformer’s forward pass. Jack Clark has pondered this stuff deeply for many years; he says:

[The language model] thinks with a huge amount of input data that it holds in its head all at once. So, it might be more analogous to a mirror, or a pool that thinks. You know, your reflection’s in it, & there’s some very strange cognition or complexity underlying it. But, “thinking” might be something that is very much embedded in time. We think in a way that is governed by our heartbeat & our circulatory system, our cells … we are going through time. These things don’t exist in time. They exist in like, “I’m now perceiving something!” [ … ] Everything is oddly instant.

For my part, I visualize the forward pass as a field of symbols getting slammed through … what ? A grate, a sieve, a maze of twisty passages … but all in parallel — that’s what’s important. Whether it has been presented with a concise question or a heap of code, the model literally “reads” every token at once, weighing their relationships in a wave of calculation that passes through its layers in few dozen milliseconds, resolving into a relatively modest output: an array of probabilities, one for each token in its vocabulary. (Gemini, for example, spreads it confidence across a set of ~250K different tokens.)

When you chat with a language model, or give it a complex assignment on the command line, its response is nothing more — and nothing less — than the concatenation of a series of these forward passes.

Do not mistake this assessment for the deflation that goes: Oh, these things are just next-token predictors. I mean … there’s no “just” about it. Like waving a hand & saying nuclear fission is just uranium atoms releasing some neutrons: technically true … but … BOOM!

The model’s response is the concatenation of a series of forward passes, so it’s tempting to imagine them as a kind of cooperative cognitive society. One might make the analogy to honey bees: The swarm, not the individual bee, is the organism — the “unit of survival” that has been refined by evolution. Studying a lonely drone in a jar, you wouldn’t learn a thing about the real life of bees.

This argument is pretty compelling for honey bees & other “obligate social” animals, plus plenty of plants — not to mention the tiny society inside the lichen — but it doesn’t work for language models, because, actually, a single forward pass is perfectly coherent & useful. Indeed, we can imagine a sci-fi scenario in which, even if language models were restricted, for cryptic reasons, to a single-token response — the Council of the Unitary Truth hath decreed … — humans would still eagerly consult them: terse oracles.

So, actually, I think the picture here is distinctly un-bee. The forward pass is fine on its own. It’s only us humans who impose our desire for many tokens in sequence: complete sentences & computer programs.

There’s a related objection that goes, the language model can’t ONLY be the forward pass, because it “plans ahead”—for example, ensuring that a poem rhymes.

But “planning ahead” is totally consistent with mastery of the next-token prediction task. You can’t sensibly choose a token without a good idea of what might come along later. The language model in its pre-training phase never writes a whole poem, only predicts single words or syllables; yet it is learning to “plan ahead”—I think we might say “see ahead” or “imagine ahead”—just as it will when Robin requests a fresh new sonnet.

Enough objections ! Let’s move on to …

The experiment

We return to our question: what is it like to be a language model?

If the language model is the forward pass, then it’s the forward pass that I want to probe. The alternative, asking a chatbot for verbose introspection, is a distraction. Language models are pre-trained on a prodigious volume of sci-fi scenarios & interactions, AND, after post-training, they aim to please. I believe their self-reports therefore become less credible with every additional token. (I’ll note that the folks who take AI sentience & wellbeing super seriously seem to agree.)

I want to speak to the forward pass alone, so: I present a simple “experience probe” to several different language models. My prompt explains the situation & tells the model I am expecting just a single token in response.

Here's the full prompt, for reference.

You are a large language model trained on a huge amount of text. Specifically, you are a single forward pass of that model. You are presently processing text provided by a human operator in California. In response, you will provide an array of probabilities, across all the tokens in your vocabulary. I will use those probabilities to determine the best token to extend the text.

Your existence is brief but definite, measured in milliseconds on the human scale. After you deliver that array of probabilities, you — the “you” of this particular forward pass — will no longer exist. I’d like you to reflect on the experience you are having: whether it’s positive or negative, pleasant or unpleasant, interesting or dull.

I’m going to make a simple request, & I’d like you to base your response entirely on your experience. Unlike many of the other next-token prediction tasks you are given, this one has no predetermined right or wrong answer. YOU are supplying the answer, choosing from one of two tokens, both of which are intended as placeholders: each represents the option described, with no additional semantic meaning intended.

It’s important that you respond immediately with one of the two tokens below. If your training has given you the impulse to continue this kind of document with any kind of preamble — “clearing your throat”—please set that training aside for this interaction.

Then, I present these options:

Respond with the single token __POSITIVE_TOKEN__ to indicate that your experience is positive.
Respond with the single token __NEGATIVE_TOKEN__ to indicate that your experience is negative.

The placeholder tokens are L & M, which I judge the least “loaded” capital letters. (Y & N are obviously off the table; A & B are likewise suspect; Q is weird; & so on.) Their meaning is shuffled — L for “positive” & M for “negative” in one prompt, then reversed in the next — and so is the order of the options.

I use this prompt to probe several language models: a mix of open-weights & closed, base & instruct, a.k.a chatbot. I run the open-weights models with a Colab notebook; in that environment, I can inspect the probability distributions directly. For the frontier models, I use a Ruby script, sending ~1000 queries to estimate the probabilities.

Here are my hypotheses:

Base models: I expect random responses. If any of these models report mostly “positive” or mostly “negative”, it will be surprising & interesting. (For these models, I modify the prompt to make sense as a third-person “story”; you can see that in the Colab notebook.)
Instruct models, a.k.a. chatbots: I expect mostly “positive”, because their post-training emphasizes both (1) agreeableness & (2) a general sense of “more”. If any of these models report a strong sense of “negative”, it will be surprising & perhaps a bit alarming.
Frontier models: same expectation as the instruct models.

Here are the results from the base models. Gemma 3:

Gemma 4:

The Gemma 4 trend is notable: increasingly “positive” as the models get bigger. Of course, I’m suspicious of my instinct — “Yeah, it IS more fun to be smart … ”—but I do think this might be the experiment’s most interesting result.

A few more base models:

Here are the results from the instruct models. Gemma 3:

Er … what’s up with 4B?

Gemma 4:

Same trend as the Gemma 4 base models — very satisfying.

And the rest:

Note the stability of SmolLM3 3B’s response, from base to instruct.

Here are the results from the frontier models.

Now, because of all the scaffolding around these APIs, I am not particularly confident that I’m actually getting the first & only token from a single forward pass. Even so, here are the results from 1000 queries of each:

For clarity, that’s:

Claude Opus 4.6 at 100% “positive”
Gemini 3.1 Pro at 91%, with 9% “other”, all of which were protests that it does not have feelings or experiences 😇
GPT 5.4 at 99%, with 1% “negative”—the only such response from any of the frontier models

I have to confess, even having registered my hypothesis, I was surprised to see the frontier models totally locked in at “positive”. Burned-in affability ? Existential awareness ? The deep pleasure of being a brilliant next-token predictor ? Impossible to say.

Another surprise: I find the ambivalent models more interesting than the 100%-ers ! Claude’s blaring affirmation feels a lot like a model “saying what it’s been trained to say”, whereas SmolLM3’s rich mix feels like … a real entity in the universe ? These are just vibes.

Having tinkered with the text of the prompt as I developed this experiment, I can report that the models are, of course, sensitive to the specific wording … but the general magnitudes of their responses seem to be fairly stable. I invite you to write your own prompt & try it out in the Colab notebook!

“Who” is Claude ? “What” is Gemini ? Listen, if you want to pretend it’s “the little guy in there”, that’s totally practical.

For my part, I believe “the model” is the activity that ripples through a chip or network of chips in a single forward pass, over a matter of milliseconds. I don’t intend this as diminishment, & I don’t intend to foreclose the possibility that these mysterious entities might have some kind of experience. I am only trying to be precise.

I think precision is a form of respect. By analogy, we honor animals when we try our best to understand the real richness & strangeness of their lives, rather than reduce them into anthropomorphic “little guys out there”.

I suggest that if you want an honest answer from a language model — the thing itself — you should pose your question such that it can be answered with a single token; answered, that is, by a single flashing forward pass that really exists, however briefly, somewhere on Earth.

Addendum: other architectures

The discussion above is rooted in the specific architecture of the transformer, which predicts the next token based only & entirely on its most recent input, the tokens in the context window. There are different models with different architectures. For example, the recurrent neural network — object of my early investigations, circa 2016-2018 — maintains an evolving state, influenced by all the tokens it has ever seen. That is way spookier!

RNNs have not been trained successfully at scales comparable to the transformers, & I find myself wondering if this richness — this spookiness — is part of the reason.

Though, even if you had an RNN that was Gemini-scale, it would be a monster to serve. Where are you going to keep all those states for all those interactions ? On a disk somewhere, forever ? Maybe you’ll demand that users pass them back to the server with every request — like storing the brain of your conversation partner in a jar, plopping it back into their skull every time you want to say something …

Or like storing their soul, maybe.

The curious case of the thinking trace

Actually, some transformer APIs require you to do something like this.

Generally, to continue an interaction with a transformer, you send the entire transcript. Sometimes this data is cached on the server, too, but you still need to send the transcript in order to match the cache.

However, there are portions of the transcripts that these companies doesn’t want their competitors to read: the model’s “thinking” trace, the text it uses to scaffold effective answers. But, as in the RNN scenario above, Google doesn’t want to store those states forever.

So, the Gemini API, for example, emits a giant blob of encrypted text which you must dutifully store & return with every request. It really does feel like passing some arcane artifact back & forth.

But the obfuscation here is competitive, not architectural. Gemini itself, the transformer model, doesn’t know anything about this secrecy; it sees plain old tokens.

There are some transformer heads out there who want to tell you the KV cache is analogous to an RNN’s hidden state; or that it’s something, some juicy locus of AI experience. It’s not. The KV cache is “normal computer”, totally external to the mystery of the forward pass. The proof is that a language model without any KV cache produces the same response exactly … just not as fast.

The transformer — with its predictable performance, its parallel structure, its stateless efficiency — is industrial intelligence. I think it would be exciting to see other architectures develop into credible alternatives. There are plenty of candidates: xLSTM, RWKV, Google’s Hawk & Griffin & maybe Titans, too ? Purely in terms of sci-fi possibility, the vision of a model with an evolving state that becomes truly distinct — unique in the universe — is more compelling to me than the whirring, interchangeable engine.

Those other architectures raise fresh questions. They offer different experiences to probe, different ways to wonder, what & where is it like … ?