Stochastic Parrots 🦜: Frequently Unasked Questions

13 min read Original article ↗

Emily M. Bender

It’s been a bit over five years since the Stochastic Parrots paper (Bender, Gebru et al 2021) was published (and somewhat longer since Google made it an enormous news story by firing my co-authors). During that time, I have been watching the phrase stochastic parrot(s) on social media, initially out of linguistic interest (it’s rare to get to see how a coinage develops from its very beginning). In the early days, most usage I saw was from people referring to the paper, and then people who had read the paper referring to large language models as stochastic parrots. Eventually, though, the phrase outran the paper, as people picked it up as a way to refer to LLMs.

Tracking this phrase also provides a window into parts of the online discourse about ā€œAIā€ that I would otherwise be unlikely to see. In that discourse, I see a lot of misconceptions about a) how large language models work and b) my own work on this topic. Accordingly, it seems like a fitting time to do some debunking, answering questions that people frequently fail to ask. Below what you’ll find aren’t questions, but the various statements that people make, when perhaps they should have stopped and asked a question.

To keep this grounded in the actual text in question, here is where we introduce the term in the original paper:

Text generated by an LM is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind. It can’t have been, because the training data never included sharing thoughts with a listener, nor does the machine have the ability to do that. This can seem counter-intuitive given the increasingly fluent qualities of automatically generated text, but we have to account for the fact that our perception of natural language text, regardless of how it was generated, is mediated by our own linguistic competence and our predisposition to interpret communicative acts as conveying coherent meaning and intent, whether or not they do [89, 140]. The problem is, if one side of the communication does not have meaning, then the comprehension of the implicit meaning is an illusion arising from our singular human understanding of language (independent of the model). Contrary to how it may seem when we observe its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot. (p.616–617)

The phrase stochastic parrots was one attempt (among several) to make vivid what it is that large language models, when used to synthesize text, are doing. In later work, (Mystery AI Hype Theater 3000, The AI Con), I’ve also added synthetic text extruding machine as a way to describe systems that closely model which bits of words tend to co-occur in their input data and can be used to, well, extrude synthetic text.

Bender says ā€œAI is a stochastic parrotā€

I have never and will never say that ā€œAIā€ is a stochastic parrot, because I reject ā€œAIā€ as a way to describe technologies (LLMs or otherwise). Also, the Stochastic Parrots paper, written in Sept-Oct 2020, was not a paper about ā€œAIā€ at all, but a paper about the risks and harms associated with the drive for ever larger language models, which, at that point, mostly weren’t being used to extrude synthetic text. (OpenAI had made GPT-2 and GPT-3 available for playing with, but this was still two years before they imposed ChatGPT on the world and synthetic text suddently became everyone’s problem.) The term ā€œAIā€ appears only once, near the end of the paper, where we write:

Work on synthetic human behavior is a bright line in ethical AI development, where downstream effects need to be understood and modeled in order to block foreseeable harm to society and different social groups. (p.619)

I believe this particular insight, and its phrasing, is due to Margaret Mitchell (aka Shmargaret). In the years since, this observation has unfortunately been repeatedly reinforced: work on synthetic human behavior unfortunately continued apace, and the foreseeable harms (predictably) came to pass.

Bender says [some model] is ā€œjustā€ a stochastic parrot

Indulge me into a little digression into linguistics here. The word just is the kind of word that evokes a scale or ranking. For example, She is just 5 feet tall places her on a scale of height and furthermore suggests that her height is further down that scale than would be expected or desirable or just normal/normative. So someone who says that I say that some model is ā€œjustā€ a stochastic parrot is also attributing a scale, perhaps of functionality (or, in the anthropomorphizing language I am always struggling against, ā€œcapabilityā€), and asserting that I am placing whatever model in the wrong, or at least a surprisingly low, spot on that scale.

This misunderstands what I was doing with the phrase stochastic parrots, and what we were doing in that paper in general. While I can’t speak for my co-authors, I am not invested in the project of ā€œAIā€, do not see it as a goal that is worthwhile (nor feasible) to work towards, and am not measuring large language models against some scale of progress towards that goal. What I am trying to do, in a world absolutely saturated with marketing selling the idea that the synthetic text extruding machines are ā€œAIā€, or maybe even ā€œAGIā€, is to help people understand what these systems actually are: systems designed to mimic the language (specifically: linguistic forms) that people use.

An important related point here is that though all of these systems (Claude, Gemini, ChatGPT, etc) have LLMs specifically designed to produce synthetic text as key components, that doesn’t mean there aren’t other components, as Margaret Mitchell also points out. Most things we historically do with computing are not well approximated by extruding synthetic text. Accordingly, if a company’s goal is to portray their product as functional, they would be well advised, for example, to run text classification systems on user input to intercept any arithmetic queries and route those to an actual calculator.

ā€œStochastic parrotā€ is a critique of LLMs/ā€œAIā€

I often see people talking about ā€œthe stochastic parrots critique of LLMs,ā€ but this, too, misapprehends at least the way I use the phrase. (This may be an accurate description of how other people use it.) I definitely take a critical view on the project of ā€œAIā€, and on the ways in which people are using synthetic text extruding machines (aka LLMs). But the target of my criticism is not the models. Rather, I am concerned about the actions of people: the data theft, the exploitative labor practices, the haphazard creation of and failure to document datasets, the complete disregard for environmental impact, and the astonishing willingness of so many to surrender their own power and turn to synthetic text (for which no one is accountable) for all kinds of weighty decisions.

ā€œStochastic parrotā€ is an insult

Another common trope in the discourse around this phrase is to claim that stochastic parrot is an insult (or even a slur). On one reading, that would require LLMs to be the kind of thing that can take or feel offense, which they clearly aren’t. But, indeed, it is also possible to insult someone’s work, or consumer product they have acquired, etc. At which point, I refer the reader to the previous two points.

Folks have also pointed out that this coinage is somewhat unfair to actual parrots who, for all I know, do have internal lives and do use their ability to mimic human speech with some kind of communicative intent. My best answer here is to say that (despite parrot in stochastic parrot being a noun), I am drawing not on the name of the bird directly but rather on the English verb to parrot, which means to repeat back without understanding.

It can’t be a stochastic parrot, it’s come up with something new!

This one misses the role of stochastic in stochastic parrot, which means randomly, according to some probability distribution. What comes out of these systems is not usually a direct regurgitation of their input, but rather a remix of it. This remix is shaped by the specific ways in which the systems were built (ā€œtrainedā€) through multiple steps, by the ā€œsystem promptā€ (a prompt prepended to user input that the user doesn’t usually see), and the user input itself. In other words, theses systems make papier-mĆ¢chĆ© of their training data, molded around the balloons of these other components.

The stochastic parrots argument [is wrong, is out of date, etc]

This one is funny because it comes up, in the same form, every time one of the companies promotes a new model. ā€œStochastic parrots might have been an accurate description in [year], but not anymore becauseā€¦ā€ and then reference to whatever demo the author has been impressed by. This is framed as heralding the arrival of ā€œrealā€ ā€œAIā€ — over and over and over again.

But stochastic parrots (in my writing at least) isn’t an argument. It’s a description or a metaphor, again an attempt to make vivid what language mimicking machines do.

The stochastic parrots hypothesis has been disproved

Stochastic parrots also does not refer to an empirical hypothesis. Accordingly, it doesn’t make sense to say it’s been ā€œdisprovedā€ or that it is ā€œunfalsifiableā€.

The closest thing to a hypothesis in this space in my writing is the argument (again, not empirical hypothesis) in Bender and Koller 2020, the one with the octopus thought experiment. The Stochastic Parrots paper refers to this earlier paper, which lays out the argument that language models don’t understand text they are used to process, because language models only ever have access to the linguistic form (i.e. spellings of words) in the training data.

In that paper, we provide a definition of understanding as mapping from language to something outside of language, and show that systems built only with linguistic form have no purchase with which to encode (ā€œlearnā€) such a mapping.

They’re not stochastic parrots because they’re multi-modal models now

Stochastic parrots was coined to refer to language models, i.e. systems trained only on linguistic form used to mimic the kinds of sequences of linguistic form that people use. It is true that image/text models, for example, that can be used to map from linguistic strings to images or vice versa, can be argued to meet the definition of understanding in Bender & Koller 2020 — albeit in an extremely thin way. But the stochastic parrots framing is still extremely relevant to these models, as well as systems built with them. As quoted above:

we have to account for the fact that our perception of natural language text, regardless of how it was generated, is mediated by our own linguistic competence

When we look at the text in an image/text model, we make sense of it in a way that is rich and socially situated and we must not project that onto the model if we want to keep a clear-eyed view of how such models actually function (and in what circumstances we should be willing to use them). Similar things can be said about the images, too, though it’s generally not linguistic competence per se they are experienced through.

As we write in the Stochastic Parrots paper:

The ersatz fluency and coherence of LMs raises several risks, precisely because humans are prepared to interpret strings belonging to languages they speak as meaningful and corresponding to the communicative intent of some individual or group of individuals who have accountability for what is said. (p.617)

User interfaces, if well designed, should be transparent in the sense of providing the user with clear information about what the system can reliably do. Even if there is some thin kind of technical ā€œunderstandingā€ in e.g. a text/image model, the fact that it’s using our language at all will send misleading signals about what is actually going on, so long as we relate to language as we always do (and I don’t see how we can avoid doing so).

ā€œStochastic parrotsā€ doesn’t capture the political economy

This one stands out because it tends to come from other folks who are critical of ā€œAIā€ but are impatient with criticism that doesn’t come from their own lens. Of course the phrase stochastic parrots isn’t a sociological critique of the way these systems are being used by corporations (and governments) to discipline labor and centralize power. It seems like a category error to ask that of a phrase coined to try to make vivid the basic functionality of the software. If you want sociological analysis, I recommend The AI Con, co-authored with a sociologist (the amazing Dr. Alex Hanna).

Bender didn’t actually coin the phrase

I came up with this phrase as we were writing the paper, and then wondered if I had heard it somewhere. As of early October 2020, a Google search for ā€œstochastic parrotā€ provided 0 hits. I also asked around on social media. It turns out there are two quasi-antecedents:

In a Daily Nous post from July 2020 (which I had not read prior to my coinage), Regina Rini writes:

So long as we get what we came for — directions to the dispensary, an arousing flame war, some freshly dank memes — then we won’t bother testing whether our interlocutor is a fellow human or an all-electronic statistical parrot.

That’s the shape of things to come. GPT-3 feasts on the corpus of online discourse and converts its carrion calories into birds of our feather.

The more direct inspiration for me was an email from Stuart Russell in September 2020 to Alexander Koller and I about our ACL 2020 paper (the one with the octopus thought experiment):

I have been watching with increasing disbelief as the NLP community becomes more and more enamored of its randomized parrots. The paper is a breath of sanity.

Once I made the connection, I emailed Russell to offer a footnote acknowledging it in the Stochastic Parrots paper (in October 2020, before the paper was publicly known). He declined.

Bender is just mad because her life’s work/her field has been made obsolete

On a related note, my view into the discourse sometimes turns up a ā€œsour grapesā€ argument, wherein people think my motivation for ā€œcritiquing LLMs/ā€˜AIā€™ā€ (see above) is that I’m just salty because my work as a linguist (and more specifically on grammar engineering, which is a symbolic rather than statistical approach to natural language processing) is somehow upended or made obsolete by LLMs. I can promise you that it is still interesting and worthwhile to study how language works and how we work with language, and to use computers to do so. And in fact, the field of linguistics is particularly relevant in this moment, as a linguist’s eye view on language technology is desperately needed to help make wise decisions about how we do and don’t use these products.