The meaning of Meaning: Why does a Stochastic Parrot make sense at all?

There is something faintly insulting about being compared to a parrot.

Not a majestic eagle, or even a thoughtful owl. A parrot! When critics of language models call them stochastic parrots, they deliver a double sting: the machine merely imitates language and it does so blindly.

The implication is clear: A stochastic parrot should not make sense.

Yet, here we are.

Machines are producing essays, arguments, poetry, software and even jokes. They can explain the causes of the French Revolution, describe the mating habits of pandas, and argue both sides of a philosophical debate about whether they themselves possess understanding.

Why? Why does the stochastic parrot make sense at all? It would not be surprising if it made sense occasionally. But it makes sense across domains, reliably and verifiably. At what point do we question our assumptions?

The coin keeps landing tails after tails: Either our theory of probability needs revising, or the coin isn’t fair!

If mere statistics over text can produce something that looks like intelligence: Either language encodes far more meaning than we ever recognized, or inference of meaning itself is not as profound a cognitive feat as we assume.

Both these conclusions are troublesome in different ways.

Language operates through patterns: patterns of grammar, associations, expectations, cues etc. Words appear in familiar neighborhoods. If I were to use schadenfreude in this essay, you would notice! LLMs exploit this structure ruthlessly. Given enough data, these patterns reveal themselves. Grammar emerges as statistical regularity. Topic associations emerge as clusters. Argument structures emerge as repeated rhetorical forms.

So, what appears to us as meaning, may, at least in part, be the large-scale statistical shape of language itself.

The philosopher Ludwig Wittgenstein famously suggested that meaning is not some hidden essence residing inside words. Instead, meaning is determined by use. To understand a word’s meaning is simply to know how it is used in the language. If you can participate in the game of language correctly, then you understand it.

LLMs are clearly good at participating in the game. They produce sentences that follow grammatical rules, maintain topical coherence, and respond plausibly to questions. If a mimicry of the language patterns, however sophisticated, can do this, what does it say about the source material?

Language is a tool we developed to convey meaning, that is its one and primary purpose. The tool does its job very well. So well, in fact, that unlike the proverbial soul-stealing-camera, it ends up capturing more than the meaning we intended. How much more? Enough to call it reasoning? Enough to call it intelligence? Enough to not be classified as a parrot anymore?

Are we worrying about the wrong tool being intelligent? Are we endlessly discussing the question of LLMs being sentient and/or intelligent while ignoring the very tool that we encoded our meaning and intelligence in, painstakingly, over millennia: a tool that lacked a way to talk back. Until LLMs gave it one.

Or maybe the coin was never fair.

Meaning was never sacred.

Children are not handed dictionaries with meanings at birth. They hear millions of sentences over several years and gradually infer how words behave. They learn which words tend to appear together, which constructions are grammatical, and which utterances are appropriate in different contexts. In other words, they learn patterns before meaning. For example, babies point at water and learn to say water as part of social conditioning before inferring that they should use the word when they want some.

Try looking at a glass of water without thinking of the word water: Is that a prediction engine inside you?

Our conversations are full of phrases that are grammatically correct, contextually appropriate, and socially acceptable without being deeply examined for truth. We adopt phrases from books, lectures, and conversations. Our thoughts are often constructed from fragments of language circulating through culture.

In this sense, is human discourse also partly parroting?

Of course, our mental model is not limited by words. We inhabit a physical world. We touch water, feel gravity, experience hunger etc. Our words and their meanings are grounded in perception and action. When we say “apple”, we are not merely referring to a pattern in the language but to an object we have seen, tasted, and dropped on the floor. Language models lack this grounding entirely. For them, the word “apple” is surrounded by other words: “fruit”, “tree”, “orchard”, “pie”, “iPhone”.

The model does not know what an apple is, only how people talk about apples.

The word “know” is doing a lot of work in the above sentence. The LLM knows how to use the word correctly in multiple contexts. So, in a way, it keeps the meaning of the word in the surrounding words. And how can we be certain that as means of grounding the word in physical world, we as well are not keeping the meaning of the word in its contextual cousins?

If we were expecting apple the fruit, but mid-sentence figure out that the context is apple the company, we are fully capable of taking it in our stride. If anything, many of us enjoy puns! Is it the variety in the “distance” from apple, that makes the pun funny?

Have you noticed a clock tick slower for a bit when you directly look at the ticking hand after a break? It is a trick of the brain. It is backfilling your experience to avoid a hole in cognition. Similarly, consider the Hollow Mask Illusion: Our brains have a convexity-bias because most objects in our world are convex and they are specialized in recognizing faces. So, the brain assumes and interprets a face protruding out, overriding sensory cues like shadows and perspective.

The brain predicts sensory signals. It consistently resolves ambiguity by selecting the most probable solutions.

The predictive processing theory of the brain proposes that cognition itself is fundamentally a prediction engine. The brain constantly anticipates incoming sensory information and adjusts its internal models when predictions fail, thoughts and words included. If this theory is correct, then prediction is not a trivial operation, it is the central mechanism of intelligence.

Is what we call ‘profound meaning’ actually just a cognitive shortcut? A trick of the brain that makes the mundane feel deep?

If:

We insist that LLMs do not understand meaning, and, hence,
By extension, that meaningful discourse can be simulated without genuinely understanding meaning

Then:
How much of our communication depends on true understanding of meaning?

We must now contend with The Problem of Other Minds. If parrots are not intelligent, and we are at least partly parrots, are we only partly intelligent? And if I cannot be certain that a machine truly understands, on what basis am I certain that you do?

The coin keeps landing tails after tails: Either our theory of probability needs revising, or the coin isn’t fair.

But there is a third possibility: There is no coin. The analogy is incorrect.

Coin-tossing is the ultimate mathematical test of the theory of probability. But how and when did we decide:

That such a thing as Objective Meaning exists,
That it is stable and shared,
That we know it and agree what it looks like, and most importantly,
That producing coherent thought is the ultimate test of machines understanding such an objective meaning.

There is a presumption that the meaning I intend when I speak is roughly the meaning you recover when you listen. It is the bedrock of human discourse. It’s the whole point of the evolution of language!

And yet, this presumption has never really been proven.

This isn’t about miscommunication. Meaning isn’t a fixed object that language faithfully transmits from one mind to another. We know that because we study fallacies and have theories of indeterminate translation. This is something else: the meaning I produce when I speak may not be the meaning I myself reconstruct when I later hear the same words. And the meaning you reconstruct may differ again - subtly, invisibly, continuously, even unintentionally.

We assume there is an Objective Meaning because if we didn’t, society would collapse. If I couldn’t trust your “red” is my “red”, we couldn’t communicate. So, we didn’t really find objective meaning as much as we legislated it into existence with common agreement. Now that we find that machines trained on the same legislation agree as well, we are conflicted between welcoming them or questioning them. Yet, it is just an agreement. The fact that the machine arrives at it approximately and mathematically need not confer upon it any quality other than the one it exhibited.

So, we do not know that Objective Meaning exists; it definitely isn’t stable and shared - rather it is a legislation of approximate agreement; and we do not know what it looks like. And yet we’ve arrived at a test for it: producing coherent text.

Humans can produce incoherent rants while deeply understanding a topic: as in a passionate but rambling expert. Or conversely, generate polished nonsense: as in some political speeches. If only coherence was the benchmark, we’d misjudge both humans and machines. Linguists have for long argued how grammar and meaning are decoupled in language. At the very least, we can assume that producing grammatically structured language cannot be sufficient evidence of understanding meaning.

So, what would constitute sufficient evidence?

Well, we haven’t rigorously defined what constitutes it for us!

Individuals exhibit a wide range of cognitive abilities, and many struggle with generalizations to novel contexts, counterfactual reasoning, or handling ambiguity without us revoking their status as “intelligent” beings. We might label someone as having specific cognitive deficit or lower IQ but we never question their fundamental capacity of understanding meaning.

We are holding LLMs to a standard that we do not hold human intelligence to, and then being doubly surprised that they are passing an unfair benchmark. But, a wrong criterion is a wrong criterion even when it’s wrong in the preferred direction - a test does not become valid simply because it is severe.

We know that the brain does not treat information symmetrically. We know that the mechanism to store information in the brain is separately optimized from the mechanism to retrieve the same information. When we speak, we aren’t downloading a file of meaning, we are reinterpreting our stored data on the fly.

Memory is not replay - it is reconstruction.

It is reasonable to ask: Does this asymmetry also extend to meaning?

It would mean that meaning is not a static object we hold, it is a performance we do.

Communication, then, works not because meanings are identical but because they are similar enough for coordination.

Between “Meaning is all in the use” and “Meaning is grounded by reality”, meaning is a spectrum, legislation by common agreement: a fluid consensus between societies, babies, readers, writers, lawyers, actors, poets, and yes, parrots. Whether that consensus arises from higher-order intentionality or from a hard-coded objective to answer a chat question is beside the point.

The stochastic parrot is participating in the same imperfect game the rest of us have always been playing. While it is adhering to the agreement, we have to let it take a seat at the table.

Does it understand the meaning of the game? Depends on what meaning means.