I don’t know how many times I’ve walked into a room, stopped in the doorway, looked around, and thought: I know there was something I was going to do in here, but what?
While I usually wouldn’t think twice about it, over the past few months, I’ve been working with LLMs and building systems that use them. The other day, standing in the doorway of my home office, unsure of why I’d come, it occurred to me that this was a recognizable failure mode of an AI agent: a compression of the context window that lost crucial information.
As a result, I’ve found myself reaching for large language models and AI agents as a metaphor for my own thinking. This isn’t too surprising. Neural networks were originally inspired by how neurons in the brain are wired, and LLMs sit within that lineage, so the resemblance isn't entirely coincidental. But in this case, the inspiration goes in the other direction. Mapping concepts from LLMs onto my own cognitive experiences seems to make them more legible. I don’t take this metaphor too seriously; it breaks down in too many places. But as a working sketch, it has begun to provide a surprisingly tangible shape to certain puzzling aspects of my lived mental landscape.
For this metaphor to do any work, a few terms need to be on the table, but if you’re already familiar with context windows, system prompts, and the concept of a harness, feel free to skip ahead.
At this point, most folks know that a large language model, or LLM, is a system that produces text one piece at a time by predicting what comes next. The prediction is shaped by what’s called a training run, a long process during which the model is exposed to enormous amounts of text and gradually adjusts a vast set of internal numbers, called weights, to sharpen its predictions. By the time training is done, the weights encode a kind of compressed map of the patterns the model has seen. After training, the model can be used for inference, which is the process of using the trained weights to generate a response from an initial prompt.
What you may not know is that when you chat with an LLM, the model doesn’t remember what was said previously. It sees it as a single block of text called the context window. Each time you send a message, the model is shown the entire conversation up to that point. This includes your messages, its previous responses, and often a hidden system prompt that a software developer wrote to shape how the model behaves. The model reads all of this as if for the first time and generates the next response.
The context window has a limited size. Once it fills up, older content has to be dropped or summarized, so a long conversation eventually requires compression, collapsing earlier turns into a summary so the model can continue without losing the thread entirely. Compression preserves the gist and loses the specifics.
Several knobs called hyperparameters govern how the model produces its output. The most familiar is temperature, which controls how much randomness the model injects into its next word choice. A low temperature makes outputs more predictable; a high temperature makes them more varied. These knobs don’t change the model’s training; they change how the trained model behaves when it’s running.
An AI agent is what you get when you wrap an LLM in a loop. The loop hands the model the current state, takes the model's response, acts on it, feeds the result back in, and repeats. This loop and everything that supports it is called the harness. The model alone only takes text in and produces text out. An LLM running inside that loop, with all the harness machinery around it, is what people mean by an AI agent.
Yesterday, my wife said something, and I responded. She looked at me, perplexed by my answer. It wasn’t that I was angry; it was an innocent back-and-forth. But I was also perplexed. Why did I respond in the way I did?
Because our own thoughts and reactions often feel natural, we generally don't question how we arrive at them. But when they feel incongruent, we have trouble working out where they came from, and doing so often involves running through a few hypotheticals: How would I feel if X happened? What would I do if Y were true? We are so used to running these hypotheticals on the fly that we don't always recognize that the thought or reaction came first and the explanation came second.
The character of my response to my wife was shaped by something in my brain that works like the weights of a model: trained on every minute of my life so far, generating outputs by inference rather than deliberation. And just as an LLM can’t examine its own weights but can only run prompts through the algorithms that use those weights, our own brains can’t examine the neural pathways through which most of our mental activity happens. The weights analogy helps ground several familiar concepts: the subconscious, intuition, common sense. All of these touch on the fact that there is a capable mechanism by which most of our thoughts and reactions are generated; it’s just that its internal operation is opaque to us.
Then there is the experience of my mind right now: the internal monologue, the sensations I’m attending to, the action I’m in the middle of. It is that inescapable feeling that there is something it is like to be me at any particular waking moment, the question Thomas Nagel famously raised in his essay on what it is like to be a bat. Consciousness, in this sense, is something like a multi-modal context window: a limited, working representation of what’s currently relevant. It is transient but continuous: any given moment is held alongside a span of recent ones, and as new moments arrive, the oldest fall away. An LLM session’s context window carries forward what has been said in recent turns, but only up to a point. After that, the context must be compressed into a summary that gestures backwards at material no longer there. Conscious experience does the same.
And the whole apparatus—the loop that generates a thought, acts on it, takes in the result, and generates the next one—is the harness. Each cycle of the loop is, in effect, prompting my own mind: given everything that has just happened, what comes next? The harness is the thing that keeps generating those prompts, moment after moment, on its own.
The harness, in this metaphor, is me.
Let’s push on this LLM-to-mind mapping a bit and consider the fact that the same input, like a question from a friend or a Slack message from a coworker, can produce wildly different responses from me depending on what kind of mood I’m in. The input appears to be the same, so what’s different must be something upstream of the input.
The closest analog in the LLM stack is a pair of mechanisms that shape any given response before it happens: the system prompt and the hyperparameters. The system prompt is a set of instructions that the model can’t see in the conversation but is conditioned on at every step. The hyperparameters determine how the LLM algorithms execute, like how much randomness to inject into the process. Together, they configure the inference. The same input run through different settings will produce different responses.
Emotions seem to work like this. Unlike a thought or a sensory perception, they aren't in the context window, but are upstream of it. Anger reads internally as an injected instruction (INJUSTICE HAS OCCURRED; ATTEND TO IT) and also as a retuning of the whole system that narrows attention and speeds up response. Sadness reads as both an instruction (NOTHING HERE MATTERS; THE FUTURE IS NOT YOURS) and a global dampening that quiets the world and pulls back from action.
This framing explains why you can’t reason your way out of fear or grief. You can add “actually, this is fine” into the context window of your conscious experience all you want; the prompt is still saying DANGER, the hyperparameters are still narrowed and tense, and every thought that comes back from your subconscious is conditioned on both. This is also why anger feels alien in retrospect. When you cool off and replay what you said, it reads as if someone else wrote it.
With this in mind, some notable human practices start to look like operating on the system at one layer or another. Journaling offloads state from the context window into durable storage. Meditation clears context and weakens the grip the current configuration has on you. Cognitive reframing attempts to rewrite the system prompt directly. Therapy is slow, targeted reinforcement learning to alter weights in the model that have calcified over time. Exercise changes the chemistry from which the hyperparameters are derived. All of these are ways we've developed to refactor ourselves. They exist precisely because the system prompts and hyperparameters of our emotions aren't things we can directly control.
One of the ideas I remember best from philosophy classes in undergrad is a small, strange perceptual effect called color phi. It demonstrates something about perception that intuitively feels false: that what we see is not just received from the world but constructed by the brain. The mind-as-LLM metaphor doesn't explain why that's true, but it does make it feel less alien.
The philosophical puzzle is grounded in an experiment from the 1970s, run by Paul Kolers and Michael von Grünau, that goes as follows:
You sit in front of a screen. A colored dot (say red) flashes briefly on the left side of the screen and then disappears. A fraction of a second later, a different-colored dot (say green) flashes briefly on the right side and disappears. If the timing and spacing are right, you don’t see two dots. You see one dot, traveling across the screen. And you see it change color at some point during its journey, before the second flash has even occurred. This is the part that should seem impossible to you.
There is no moving dot. There is no color change. The brain has constructed a smooth perception of motion and transition out of two discrete flashes. And it has done so retroactively: at the moment you experience the dot turning green halfway across, the green flash hasn’t happened yet, so the brain cannot have known there would be a color change. Whatever produces your conscious report of “I saw a red dot become green” must be operating on a delay long enough to have already integrated the second flash, and then writing the resulting story backward into your apparent experience of the first.
The philosopher Daniel Dennett built a whole theory of consciousness around this experiment in 1991 (his Multiple Drafts Model), and the basic shape of his argument has held up. What I want to add is that the LLM suggests a mechanical picture of what’s happening.
When you call an LLM, the model sees the entire context as a single input. There is nothing privileged about “what was said earlier” versus “what was said later”. It’s all just tokens, and the model conditions its next output on all of them at once. This means you can do something that would feel deeply weird if you tried it in a real conversation: between turns, you can edit the earlier parts of the transcript. You can rewrite what the model “said” three turns ago, and the next time you sample, the model will proceed as though that’s what it had always said. Its “memory” of the past has been overwritten by its present context.
Color phi suggests the brain is doing a version of this in real time. The experience of the moving, color-changing dot is a context window that has been edited after the fact to be internally consistent with the second flash. The “memory” of seeing motion is being written at the same moment as the perception of it. There is no point at which an unedited version was ever conscious.
This is an unsettling thing to learn about yourself, but it stops being mysterious once you have the right picture. Of course, experience is editable. The thing producing the experience has access to the whole context at once.
There is one more puzzle I want to put through the wringer of this mind-as-LLM metaphor, and I'll be honest, I'm probably pushing it to its breaking point.
Philosophers have spent four hundred years worrying about what gets called the Cartesian theater: the felt sense that experience is a stage, and that some inner self is watching the stage from the audience. The trouble is that the watcher needs a mind of its own to do the watching, which would have to contain its own theater, and so on, all the way down. The picture seems unavoidable and impossible at the same time.
LLMs already give us something that looks a lot like the stage. The context window of a contemporary multimodal model is a rich, structured stream of our inner thoughts and sensory experiences, held together as a single working representation. If you wanted to build a Cartesian theater, you could do worse than that. The thorny question has always been the audience.
The agent loop answers the audience question in a way I find somewhat satisfying. There is no audience. Or rather, the actors are performing for themselves, generating the next moment from the current one, in a loop that doesn't need anyone in the seats to keep going. Conscious experience is an endless ripple of self-prompting from one moment to the next. A self-prompting LLM agent, of the kind that can be built today, is the closest mechanical picture I know of that suggests what this might look like: a system in which there is nobody in the seats because the seats are also the projector.
The self as an LLM wrapped in a harness that prompts itself, with consciousness as the ever-changing context, is at best a way of standing next to a few things that used to feel stranger to me than they do now. It doesn’t explain emotion, or perception, or the felt continuity of a self watching its own thoughts. But it does give those puzzles a different shape, one I find I can hold more easily.
It also doesn’t address what may be the most perplexing aspect of consciousness, the fact that it exists at all. This is what David Chalmers named the hard problem of consciousness, and the metaphor makes nothing about it feel less alien, gives it no different shape, sketches no working version of it. Why there is something it is like to be the loop, rather than nothing, is left exactly where the philosophers found it. I don’t expect any metaphor to close that gap, and I suspect the right answer, if there is one, will look nothing like the language we currently have for thinking about minds.



