This piece is a more accessible write-up of arguments from a paper currently in review. Will the paper be published? Anonymous Reviewer willing, we can hope. In the meantime, I feel the topic is important and neglected. Enjoy!
The Causality Dogma
In a now-forgotten 1975 chapter in Language, Mind, and Knowledge, Noam Chomsky responded to the idea that human and animal communication systems differ merely in their degree of outputs. In doing so, he draws attention to (what he argues is) a distinctive characteristic of human language (all bolded emphases added, throughout):
This signaling system differs from human language not only in that it is far greater - not numerically more limited - in scope, but also, more importantly, in that it is directly associated to external stimulus conditions; it is, in short, a signaling system, like a speedometer, and not a language in the human sense, a system that is available for free expression of thought precisely because it is not tied directly to external stimuli.
After a characteristic reference to Cartesian philosophers - who understood human language use to “[escape] the inherent limits of mechanical explanation” - Chomsky says:
…human language differs qualitatively in this respect…in that it is in principle an infinite discrete system rather than a continuous system, and that it is related to external stimuli not by the mechanism of stimulus control, but by the much more obscure relation of appropriateness.
The notion that human language is recruited not in direct, controlling association with external stimulus conditions - the very situations to which we routinely put our linguistic resources - is not intuitive; common wisdom holds that all actions, human or otherwise, have causal antecedents which need only be identified to bring those actions under the heel of scientific explanation.
Chomsky appears to be doubting, therefore, that human language use is analogous not only to animal communication systems, but also to machine behavior; the latter groups acting in direct association with external stimulus conditions, the former detached from external stimulus conditions.
In simple terms: Chomsky is doubting that human behavior exists within a causal structure, at least in the domain of language use.
Consider Richard McDonough’s comments on the matter from 1994, some twenty-eight years before Large Language Models (LLMs) led to widespread insanity. Here, he responds to the insistence that all observable phenomena must be accountable in mechanical terms:
It is this mechanistic research program which has trouble with the facts. No one has given anything like a plausible mechanistic explanation of how one knows how to respond appropriately in the relevant normative context of institutions, customs, practices, standards, etc. That is, no-one has given anything like an account of how these familiar features of human life could be reduced or cashed in terms of the sort of properties accessible to a mere machine (like a photo-electric cell).
This comment may seem bizarre as one reads it - causal accounts may be temporarily lacking, one thinks, but such an account could in principle be offered to explain appropriate human behavior. Pocket this sense, for a moment.
McDonough continues:
One can build machines which can eject syntactic structures in impressive fashion. But any such structure may take different meanings in different contexts. So to explain the production of a structure is not to explain the expression of a meaning.
The target of McDonough’s remarks thus seem to be not the ability of a system to produce expressive outputs, but to produce outputs such that they are appropriate to normative contexts. (If that bizarre sense is creeping up again, pocket it once more.)
To suggest that ordinary human behavior can be explained in causal terms is to suggest that one need not make an appeal beyond the internal state of the individual at the moment of speaking. The characteristics of the internal state are sufficient to explain the behavior. External forces may impinge on this internal state, though the reason for an ‘output’ from the mechanical system that comprises a human being is a change in the configuration of various internal mechanisms that constitute their internal state. From this change, the internal state yields an output. This entire process, so conceived, is ‘mechanical’ - broadly, it is explicable in causal terms, with mechanisms posited to make sense of the relation between cause and effect.

Within mechanistic explanations, mechanisms are posited to make sense of the relationship between cause and effect. That is, there is a causal structure that one identifies which leads from a starting point (the cause) to the mechanism on which this cause acts, which then results in an outcome (the effect).
Here is an example of a possible mechanistic explanation regarding human linguistic cognition:
The identifiable nature of the external cause is not to be overlooked - if the external cause cannot be identified, such that one cannot make statements about the resulting outcome independent of that outcome, then it is unclear that one has established a causal chain of events. A linkage must be identified from the external cause to the outputs of an internal structure. (As explained below, to fail to make this identification is to empty the cause of its intended force.)
Concretely, Person B’s ability to perceive a given snippet of auditory information as a sentence in English - putting aside whether the sentence is meaningful, accurate, truthful, etc. - depends on there being an initial English utterance (the external cause). The language faculty in Figure 2 is the intermediate structure - the mechanism - posited to make sense of the relationship between the cause - Person A’s utterance - and the effect - Person B’s perception of the utterance as a sentence in English.
There is, additionally, a reliable relationship between the cause and the effect. Owing to the properties of the intermediate mechanism - the language faculty - Person A speaking English invariably results in Person B interpreting the remark as a sentence in English. On no serious causal account could we suggest that the outcome is explained in reference to any possible starting point; the intermediate mechanism’s internal state is altered in reliable, predictable ways. A snippet of auditory information uttered by Person A who natively speaks Swahili may be perceived by Person B as a sentence in a language, but they will not perceive it as a sentence in English. The effect of perceiving a sentence in their native language is essentially invariant when paired with the cause, save for non-relevant confounders like neurological impairments (which may damage the intermediate mechanism).
This process can be situated in overarching explanatory statements about the language faculty and its role in this causal structure without merely re-stating the specifics of the example. It does not matter if a native English speaker hears the verbal remarks of another English speaker in one continent or another, during the day or night, if they are happy or depressed, if they are in love or jealous. Person A’s manipulations of the air, sent through the air to hit the eardrums of Person B, will always and invariably be interpreted as an English sentence by Person B because of the properties of the intermediate mechanism - the language faculty.1
This process is consistent with Descartes’ characterization of a machine, presented in his “language test” to determine whether a being that resembled a human was, in fact, minded (ensouled) in ways comparable to humans.
Language use was evidence, for Descartes, of mind given that such behavior exceeded the mechanical characteristics of other natural phenomena (in his view). If a being does not exhibit this behavior, then it is mechanical - a machine. He writes:
[T]hey could never use words or other signs arranged in such a manner as is competent to us in order to declare our thoughts to others: for we may easily conceive a machine to be so constructed that it emits vocables, and even that it emits some correspondent to the action upon it of external objects which cause a change in its organs…but not that it should arrange them variously so as appositely to reply to what is said in its presence, as men of the lowest grade of intellect can do.
When Descartes says we can “easily conceive” of something, he means that it is conceivable within the mechanical philosophy of his time. And within these confines, Descartes believes we can conceive of a machine that emits words and we can conceive a machine that may output these words in direct contact with an external force (to output them as a “correspondent to the action upon it of external objects which cause a change in its organ…”). This, for Descartes, is a mechanical process. What is not mechanical for Descartes is the diversity of word arrangement and their appropriate application to a situation - a common feature of human behavior.
On Descartes’ account, the machine could not output the limitless diversity of words that humans may produce for a simple reason: the intermediate mechanisms (the relevant subcomponents of the machine’s internal state) had no way of producing an infinite diversity of structured arrangements of words in an appropriate fashion, as the mechanisms are finite; his mechanical philosophy could not make sense of this ability in humans, thus machines could not replicate it via physical means.
Descartes was wrong that a machine - being a finite, physical mechanism - could not generate the infinite diversity of outputs that characterize human languages, a fact now known in large part thanks to Alan Turing (whose own “language test” - the Imitation Game - operated in an entirely different framework than Descartes’, unburdened by the mechanical philosophy’s restrictions, yet also lower in its ambitions).
Nevertheless, it is fascinating to find that many proponents of the idea that all human behavior can be explained, in principle, in causal terms rely on something akin to Descartes’ own view, though applied to the full spectrum of human behavior. We are all complicated machines, as it were. To explain my action of writing these words at this moment, no appeal beyond my internal state - the various configurations of my internal physiological and neurological structures - need be made. This is not to say that external forces are not impinging on my internal state, but that my internal state is a set of mechanisms that exists within a causal structure that starts with some stimulus and ends with the words you see here.
If nothing else, this view is intuitive. Indeed, many today insist that causal explanations not only are sufficient for all cases, but must be sufficient; all things must be accounted for causally, with any lack of causal explanations in a given case amounting merely to temporary ignorance, not ignorance in principle.
These positions make an assumption: the scope of application of causal explanations is sufficient to account for all observed phenomena. From this assumption, human behavior, being part of all observed phenomena, is taken to be within the scope of causal explanations.
The belief that causality accounts for all human behavior is, in many formulations, a dogma. Justification is never offered and the premise cannot be questioned. McDonough is correct when he says:
It is to be expected that the contemporary defenders of the mechanistic faith simply assume the internalist dogma. But the fact that the alleged ‘critics’ of cognitive science and AI do so as well suggests that a deep philosophical assumption goes unnoticed.
And:
It has long been something of a dogma in philosophy that it is at least theoretically possible that machines can imitate any human behavior…This makes it appear as if it is only an empirical question, perhaps to be answered by technological advance, whether there can be precise machine duplications of human behavior.
He also says:
One’s list of philosophical problems depends on one’s prior philosophical assumptions…For example, the assumption that the universe is fundamentally mechanical generates the philosophical problem of explaining how intelligence is produced by stupid mechanisms. It is the fact that mechanisms seem intrinsically incapable of anything like human intelligence or creativity which lays the foundation for this problem. For a machine’s output must be a causal product of what goes on inside it. The intractability of this problem explains why most alleged machine models of creativity are really models of something else (such as ‘productivity’, the ability to generate an infinite number of sentences).
Machine models of “creativity” are really models of “productivity.” There is something to this - in particular, the idea that causal explanations can essentially collapse the distinction between a particular human competence (e.g., language processing) and the use of this competence (e.g., ordinary language behavior) without realizing the latter has been subsumed by the former.
The Causal Claims and Why They Fail
What kind of claim is being made when one suggests that human language use is caused? Two commonsense views are likely to be invoked:2
Individuals speak as a result of internal processes occurring within their brains (or elsewhere in the body). These processes are deterministic not in the sense that there is only one possible outcome (i.e., one possible thing an individual may say), but in that they each unfold within causal structures in which various mechanisms - like the language faculty - generate various outcomes. One of these outcomes is the act of speaking.
Individuals speak as a result of internal processes that are deterministically set into motion - caused - by external forces, like being spoken to. The act of Person A speaking to Person B is a cause that imposes a controlling force on Person B. The internal mechanisms within Person B’s brain/body are deterministically set into motion, the end-result of which is Person B speaking in response to Person A.
In both cases, there is a fixed relationship between local context and the utterance in question.3
These are taken in reverse order, starting with (2).
Why External Causes Fail
There are a few, relatively simple observations we can make about ordinary human language use.
Individuals may or may not use their language in a given situation, despite the availability of possible causes in the local environment - factors that, in other contexts, seem to result in language use. That is, while one can identify, in a post-hoc fashion, a litany of possible causes in an individual’s local environment, these do not appear to determine whether the individual speaks (or signs, or so forth).
An example: a high-profile individual - a politician, say - may exit a courthouse while being shouted questions by reporters about their alleged wrongdoing. The politician no doubt hears these questions, and therefore is on the receiving end of what one might deem a set of causes: others using language to impel the politician to use their language (the dynamic between Person A and Person B).
Yet, we do not doubt that the politician is capable of not responding to these reporters’ questions - this is, in fact, something of a cliche; such individuals are often trying to run away from those reporters. This seems to imply a lack of causal connection between the two.
But wait! Perhaps the relevant cause for the politician’s lack of language use is not the reporters’ questions, but the lawyer who advised the politician prior to leaving the courthouse to not to respond to reporters’ questions (“When we get out there, they’re gonna hound you. Don’t take the bait.”)
However, this merely shifts the explanatory burden in ways that might escape immediate notice: why does the lawyer’s remark exert a controlling influence on the politician, but not the reporters' remarks? And what, for that matter, is causing the lawyer to say as much? And why, if one were to suggest a cause, would other factors in the environment not cause the lawyer’s remarks?
Notice that, in the example of a causal process in Figure 2 above, there is a reliable relationship between cause and effect. Any snippet of auditory information spoken by a native English speaker will be received another native English speaker as a sentence in English. Person B has no choice but to understand the sentence as English because it exists within a causal structure.
In the case of the politician, it appears they do have a choice - they can listen to the lawyer, and not say anything to the reporters waiting outside. Or, they can disrespect the lawyer’s costly advice and take the bait.
Why, in this latter scenario, does the causal influence of the lawyer’s remark (to shut the hell up) not hold a controlling influence over the politician? Do we now shift the causal burden back to the reporters’ questions?
Notice that, in any event, the remarks of all participants in this scenario “fit” the situation - the lawyer uses his words to advise his client; the reporters use their words to probe into the politician’s alleged wrongdoing; and the politician - should they speak at all - uses their words to express disdain for the reporters’ and their questions. These expressions are appropriate to the situation, yet none appear caused by the situation in any way identifiable independent of the situation’s particular characteristics (e.g., we cannot identify a particular factor in the local context without merely devolving into guesswork about its various characteristics).
Moving beyond this scenario, the ability to say nothing at all is perhaps the clearest indication of how human language use is causally detached from circumstances. As Chomsky writes:
If, for example, I were to take out a machine gun, point it menacingly at you, and command you to shout “Heil Hitler,” you might do it if you had reason to believe I was a homicidal maniac, but you would have a choice in the matter, even if that choice is not exercised. The situation is not unknown in the real world; under Nazi occupation, for example, many people - in some countries, the vast majority - became active or passive collaborators, became active or passive collaborators, but some resisted.
I sometimes think of a harrowing example concerning former U.S. Speaker of the House Dennis Hastert. Hastert was accused of (among other things) sexually abusing Steve Reinboldt, who later died of an illness. His sister, Jolene Reinboldt Burdge, recounts that Hastert showed up to Steve Reinboldt’s viewing:
I was just there just trying to bite my tongue thinking that blood was coming out because I was just…So after he had gone through the line I followed him out into the parking lot of the funeral home,” Jolene said. “I said, ‘I want to know why you did what you did to my brother.’ And he just stood there and stared at me. He didn’t say, ‘What are you talking about?’ you know, [or], ‘What? I don’t know what you’re talking about.’ He just stood there and stared at me.
Then I just continued to say, ‘I want you to know your secret didn’t die in there with my brother. And I want you to remember that I’m out here and that I know.’ And again, he just stood there and he did not say a word.
Hastert got in his car and drove away. Jolene said Hastert’s non-response “said everything.”
That one cannot reliably identify a causal relationship between the local environment and particular linguistic expressions suggests that something about the causal enterprise is insufficient for the task at hand. Indeed, in his 1959 critique of B.F. Skinner’s Verbal Behavior, Chomsky made precisely this point when arguing against the notion of stimulus-control:
If we look at a red chair and say red, the response is under the control of the stimulus redness; if we say chair, it is under the control of the collection of properties (for Skinner, the object) chairness, and similarly for any other response. This device is as simple as it is empty. Since properties are free for the asking (we have as many of them as we have nonsynonymous descriptive expressions in our language, whatever this means exactly), we can account for a wide class of responses in terms of Skinnerian functional analysis by identifying the controlling stimuli.
And for the kicker, Chomsky continues:
But the word stimulus has lost all objectivity in this usage. Stimuli are no longer part of the outside physical world; they are driven back into the organism. We identify the stimulus when we hear the response…talk of stimulus control simply disguises a complete retreat to a mentalistic psychology. We cannot predict verbal behavior in terms of the stimuli in the speaker’s environment, since we do not know what the current stimuli are until he responds.
Chomsky is here arguing that the scope of the concept stimulus-control is insufficient to the task at hand - explaining human linguistic behavior. Indeed, the effort to pair up such-and-such arbitrary linguistic expression with such-and-such factor in the local environment voids the concept of cause (or stimulus-control) of the meaning its proponents no doubt intended for it to hold; no longer is the human under investigation within an identifiable causal structure, but instead attributed reasons for speaking that are quite apart an identifiable and fixed relation with the external environment. McGilvray elaborates:
It is obvious that there is no prima facie reason to think that there is a “real” causal connection between her circumstances, her uttering words at all, and the specific expression(s) she produces.
The nature of this application of causality is a post-hoc effort:
This is not the well-defined causality of serious theory, and - given the apparent impossibility of limiting the factors that might play a role in it - it never will be.
A picture emerges of a human being who is deeply inclined to respond to others as is appropriate in a given situation - to adhere to social norms, to express one’s feelings, to protect oneself, and so forth - yet not caused to do so.
Causal factors arising from outside the individual are poorly suited for this phenomenon. It does not work, as we cannot make sense of how an individual uses their language - or does not use their language - in ways that are appropriate to circumstances given their apparent unfixed relationship with local contexts.
Why Internal Stimuli Fail
The claim in (1) is that individuals use their language as a result of an internal process that deterministically yields certain outputs, like language use.
On this view, the individual produces an expression in a given situation. This action is a deterministic result of internal processes; the mechanisms that comprise an individual’s internal state output expressions owing to their various configurations. Internal generation is therefore as ‘mechanistic’ as speaking, as both are part of the same process situated within a causal structure. The cause is, say, a particular firing of neurons, and the effect is the individual verbalizing words. (This is deliberately crude; don’t read into the specifics of the mechanisms.)
Linguistic expressions convey certain meanings. Once conveyed, these meanings cohere with the thoughts of others present in the situation. Indeed, such expressions may be novel in the sense of conveying a new meaning, yet the novelty of the these expressions is as familiar to the individual - and those who hear it - as any other. (You read a fair number of novel sentences in this essay thus far, and none of them confounded your linguistic comprehension.) Once expressed, a remark is given an interpretation of appropriateness by others in the situation, such that the remark “fits” the situation as judged by others.
Problems begin to arise rather quickly.
To suggest that the expressions of Person A are the outcome of Person A’s internal process of generation is to claim that the expressions are effects and the causes are physiological and neuronal (and the like) processes. Yet, why do the words of Person A, produced and uttered as a result of internal neurological processes of generation that operate independently of the given set of circumstances, converge with the thoughts that pass through the minds of Persons B, C, D, and so forth?
What allows Person A’s internal structures to generate an effect that fits a situation from which they are causally detached? Put another way, why do the internal mechanisms in Person A - their causal structure, as it were - produce expressions that align with the interpretations of appropriateness generated by the internal mechanisms of Persons B, C, D, and so forth?
Why do the internal processes of generation within my brain - causally detached, as we have seen, from the internal processes of others in a given set of circumstances - correspond with the thoughts in your brain? How did your brain impose an interpretation of appropriateness on my (novel) remarks - an interpretation driven by your internal causal structure - despite lacking an identifiable causal connection with my internal causal structure?
Aha! The critic points out - there is an identifiable causal connection! The reason why we can align our thoughts through language use in this way is because the causal structures in my body are impinged upon by the causal structures in your body; your expression caused me to generate an expression that corresponds with your thoughts.
Now, you see why I took the claims in reverse order, as this hypothetical-but-all-too-real interlocutor has just returned to the problem of external causation. The purported explanation is entirely post-hoc, relying on an assessment of various factors in the local environment from which individuals are evidently capable of extricating themselves entirely - they may say nothing at all.
It is essentially the inverse problem of external causation, now suggesting that the causal locus of linguistic behavior can be identified objectively in the local environment. When this effort inevitably fails - as it must, because individuals exhibit no “fixed association of utterances to external stimuli” - the proponent will find themselves returning to internal causation once again, voiding the explanatory usefulness of causation along the way. And so it continues.
Causality just does not work.
Human Behavior is Governed by Uncaused Appropriateness
The problem is deeper than even this treatment indicates. Exactly what constitutes appropriateness is not clear, and it is not obvious how this notion could be specified or put into a list-like form - the number of possible sentences a person can speak is infinite; the number of meanings conveyed infinite. The number of possible situations in which these meaningful sentences are spoken is infinite. There is therefore no upper bound, save for non-relevant constraints like working memory and longevity, on the number of appropriate sentences a person can utter and interpret, making any such list-like specification impossible. All we appear able to say with confidence is that human language use recruits and interfaces with other cognitive systems such that a produced expression is appropriate to the situation which does not cause it.
We can make further observations: an expression specifies a certain intent - to convey a particular meaning that will then cohere with the thoughts of others present in the situation (you can observe this in your own selection of words4). Once expressed, this remark is given an interpretation of appropriateness by others in the situation, such that the remark “fits” the situation as judged by others. If one were evaluating this on a causal account, the denotations “cause” and “effect” become nearly incoherent. The specification of intent through language use is allegedly the effect of an internal causal process; the uttered words are the outcome of a causal process, carrying a particular meaning. But how could the individual specify an intent such that it finds itself conforming with the passing thoughts of others, whose own interpretation is the result of their own, separate causal structure? There are no identifiable - or even coherent - causes and effects here.
Again: causality just does not work. The reason why we must stretch the concept of causality in human language use beyond any apparent usefulness is because it is not useful.
In the language processing example above, details about a situation beyond the interpretation of a snippet of auditory information are irrelevant to Person B perceiving it to be a sentence of their native English language. That is, no appeal beyond Person B’s internal state need be made to make sense of their perception.
The use of language appears decidedly unlike this. Indeed, attempts to make sense of language use as the effect of a causal chain of events leads one to merely re-state the specifics of the given situation. Suddenly, details that were irrelevant and extraneous in the case of language processing assume the role of explanation in the case of language use. Used in this way, a “cause” has been reduced to a description of the events of the situation. No explanation is offered, other than commonsense explanations that merely re-affirm our ability to intuitively judge expressions to be appropriate or inappropriate. By this point, we have left the domain of scientific inquiry.
Human language use is instead regulated according to the “obscure relation of appropriateness.” Language use is uncaused, frequently novel, yet appropriate to the circumstances in which it arises without being caused by those circumstances. It makes more sense to speak of a person who uses their internal mechanisms voluntarily to express the contents of their minds.
There is, so far as anyone knows, nothing else like this behavior in the animal or technological world. How humans accomplish it is anyone’s guess.
LLM Behavior Is Governed by Caused, Functional Appropriateness
LLMs do not face the burden of ‘fitting’ their internal generative process to a situation from which they are causally detached. Rather, the LLM exists in a causal structure from which it shows no ability to extricate itself.
One of the lessons we learned about appeals to external causation in the case of humans is that any such appeal is post-hoc; dependent on an evacuation of the content that gives causality force in the first place. If one is attempting a causal explanation, humans are unreliable in the relations to which their language is used.
With LLMs, no such problem arises. We can readily identify the input values that deterministically generate outputs of the system. These input values typically take the form of a “prompt” in which a human end-user (or, an agent of some kind, but this agent would be instructed by a human) inputs data in a given modality, typically text. Upon receiving the input value, the internal state of the LLM is altered. From this, these internal structure yield an output.
The entire process is such that the LLM can be situated within a causal structure. In this causal structure, no appeal beyond the model’s internal state need be made to explain its outputs.5 This holds for LLMs that are embedded in chat systems, or the base LLM itself. It indeed holds for all computational systems, period.
The central observation made about computational devices, including LLMs, is that they do not raise the problem raised by observations of human language use. Only the latter appears unfixed to local context in the expression of meaning. The former are bound to context not in some unspecifiable fashion, but in ways that allow observers to link their outputs with an alteration of their internal state impinged upon by an external force (an input). This statement can be made independent of the particulars of any arbitrary situation in which an LLM is prompted (or otherwise instructed).
LLMs are therefore consistent, in this sense, with the Cartesian approach so beloved by Chomsky:
A machine may be impelled to act in a certain way, but it cannot be inclined; with human beings, it is often the reverse.
The LLM appears lacking in inclinations, existing within an identifiable causal structure.
Still, it is the novelty of LLM outputs that throw a wrench in our own intuitions; the model appears to be selecting words, as if for its own purposes, when it outputs coherent texts that are not mere reproductions of its training data. The hypothetical interlocutor would no doubt point out that the model’s outputs are not deterministic, in the sense that its outputs are stochastic yet appropriate; random, to some degree, but fitting the situation.6
Indeed, although it is not clear what Anthropic or OpenAI researchers are literally saying when they suggest models “scheme,” “blackmail,” and so forth, the purpose of such research is to imply to the reader that the model is acting on its own accord, quite apart from the input it receives. This, too, is evidence of inclination to act, rather than being impelled to act, one might argue.
The novelty of an LLM’s output concerns its content; the specific strings of tokens. This itself owes to the stochastic nature of an LLM’s internal mechanisms (see, Figure 5 above). This kind of randomness (or non-determinacy) - while interesting in other contexts - does not exist at the level of analysis we are interested in here. LLMs are determined to generate outputs when provided with an input value owing to the characteristics of their internal mechanisms. The relation between the output of the LLM and the input can be made sense of in causal terms, and the novelty of the content is merely the result of its internal mechanisms being set into motion, as it were. Not inclined to output, but impelled.
That an LLM generates outputs stochastically is merely a characterization of its internal mechanisms; an LLM, when generating an output, samples a probability distribution that the model acquired during its various training phases. The implication of this is that, while the model is in production, its internal mechanisms do not yield the same outputs (content) in response to the same inputs over repeated instances; it is stochastic. Yet, this has no bearing on the issue we are concerned with here, as no appeal beyond the internal state of the LLM need be made to make sense of this.
The LLM, existing in a causal structure, is impelled by the human whose language use is unfixed to the local context. This in fact holds for all machines, computational or otherwise. The usefulness of the model’s potentially novel outputs should be conceived as functionally appropriate or inappropriate - functioning or malfunctioning - as the model’s outputs exist as a necessary function of the local context.
Indeed, the reliability of a given device, like Claude Code, is a giveaway that its outputs are mechanical; it must adjoined to the normative context of a human being for it to serve such reliable functions. The more reliable the tool, the more suited its internal mechanics are to serve these functions. A device can only be useful if it exhibits no inclination to act otherwise; humans, for their part, have a habit of rebelling against such uses (they can detach themselves from local context). The more reliable LLM-based systems like Claude Code become, the clearer its fixed relationship with local context becomes.
The tests for ‘misalignment’ say nothing about this. They invariably draw attention to content outputted by the LLM, some of which is novel, that may be read in ways concerning to humans. Whatever the merit of that interpretation, it has no bearing on the LLM’s functional relationship with its environment.
Consider a trendline in LLM development: In 2019, OpenAI published a research report on GPT-2. They note that there are certain tasks on which GPT-2 was tested. They find that, on certain tasks,
its performance is still only rudimentary according to quantitative metrics. While suggestive as a research result, in terms of practical applications, the zero-shot performance of GPT-2 is still far from use-able.
In 2020, OpenAI published a research report on GPT-3. In this report, they make a similar observation about task performance. Although they note, GPT-3 “still has notable weaknesses,” in certain areas GPT-3 nonetheless exhibits
strong quantitative and qualitative improvements…particularly compared to its direct predecessor GPT-2
…
GPT-3 improves the quality of text generation and adaptability over smaller models and increases the difficulty of distinguishing synthetic text from human-written text.
This trendline continues through the early 2020s, with the GPT-4 research report noting the following:
To test its capabilities…GPT-4 was evaluated on a variety of exams originally designed for humans. In these evaluations it performs quite well and often outscores the vast majority of human test takers. For example, on a simulated bar exam, GPT-4 achieves a score that falls in the top 10% of test takers. This contrasts with GPT-3.5, which scores in the bottom 10%.
These improvements are improvements in the functionality of the LLM. At no point in the developmental trajectory of GPT-1 - GPT-5 does it acquire an ability to causally detach itself from inputs, expressing new meanings, and doing so in ways that cohere with the thoughts of humans also detached from the circumstances.
Indeed, the quoted excerpt from GPT-2’s reports establishes functionality for human applications as the bar for improvement. If GPT-2 had been widely released as a commercial product, then it would have outputted an enormous amount of unhelpful, inaccurate, or harmful content (a fact I trust no one denies). Yet, this would merely be a malfunctioning product. Garbage, thrown away like any other bad tool. The improvements of the GPT series, culminating in GPT-5.4, are improvements in functionality. Matters of choice simply do not arise. GPT-5.4 remains as impelled to act as GPT-2.
LLM developmental trajectories may be conceived as a sharpening of the distribution from which a model samples generation-after-generation. The sharpening occurs in reference to the judgments made by humans, such that new models tend to converge more effectively on these judgments in production (i.e., they are more accurate, where accuracy is almost entirely gauged in reference to humanity’s existing knowledge enclosure). The novelty of the ensuing outputs result from this sharpened distribution, such that they are not reproductions of the model’s training data, but nevertheless increase in their convergence with human judgments.
At no point in this trajectory does the model become “inclined” to act rather than “impelled” to act. The stochastic nature of the content of the outputs is changed only in the distribution from which the model samples, not in the functional relationship between input and output - for which the model’s internal mechanisms are sufficient to make sense of why a particular input yields a particular output (even if such a theory is currently lacking).
There is, in this sense, no relevant distinction between a model outputting an incorrect answer and a model malfunctioning; its outputs exist within a causal structure from which it cannot detach itself, adjoined to a human normative context in which certain answers are judged correct and others incorrect.
Recall McDonough’s remark quoted above that many alleged machine models of “creativity” are really models of “productivity.” This is essentially his point: whenever one seeks a justification for the claim that humans and machines are fundamentally no different from one another, the proponent invariably begins discussing the productivity of the machine; the sophistication of its outputs. (Gold-medal at the IMO! Erdős solutions! Donald Knuth!!) But the fundamental difference we have identified here is not productivity; it is the “obscure relation of appropriateness” that governs only human behavior. It is more suitable to think of human cognitive capacities as ‘productive,’ though under the voluntary direction of individuals - a thought anathema to the causal dogma yet no way out of it is apparent.
The outputs of state-of-the-art LLMs are therefore functionally appropriate because they exist in a causal structure; their outputs are not associated by the “obscure relation of appropriateness” for which identifiable causal antecedents are lacking. Rather, they are appropriate in reference to the normative contexts to which they are attached by humans.
Persons Are Persons, Machines Are Machines
Nothing said here disparages LLMs or elevates humans. Indeed, we have respected the characteristics of each, such that observations about their behavior can be drawn out, articulated, and placed under respective governing principles. Nothing said, moreover, has cast doubt on the “intelligence” of LLMs nor their usefulness. Nor have we attempted to sharply limit the developmental trajectory of LLMs. If anything has been disparaged, it is the sensibilities of those insistent on reducing all observed phenomena to mechanisms in causal structures.
Rather, this exercise has allowed us to merely draw a distinction between humans and LLMs and re-affirmed that the latter, and not the former, are governed by principles fitting of machines. Humans and LLMs are simply different. Indeed, McDonough is worth quoting a final time:
It is clear that 'smart' machines can do many things which persons cannot, and there is every reason to expect even more astonishing developments in the future. But in order to avoid a misleading analogy, one must retain scare quotes around 'smart'. For any machine, 'smart' or 'dumb', it must be possible to sufficiently explain its 'intelligent' output by telling an engineering story about its insides, i.e., 'smart' or 'dumb', machines cannot be creative in the sense achieved by even the dullest human beings.
Thus, to suggest that a human is “creative” in the sense associated with ordinary language use is not to suggest that each human produces creative artefacts on a par with Newton and Turing. It is to suggest that human behavior’s relationship with the external world is governed by the obscure relation of uncaused appropriateness, whereas machine behavior is governed by the relation of caused, functional appropriateness.
Further listening (timecode-linked):




