Itās been a bit over five years since the Stochastic Parrots paper (Bender, Gebru et al 2021) was published (and somewhat longer since Google made it an enormous news story by firing my co-authors). During that time, I have been watching the phrase stochastic parrot(s) on social media, initially out of linguistic interest (itās rare to get to see how a coinage develops from its very beginning). In the early days, most usage I saw was from people referring to the paper, and then people who had read the paper referring to large language models as stochastic parrots. Eventually, though, the phrase outran the paper, as people picked it up as a way to refer to LLMs.
Tracking this phrase also provides a window into parts of the online discourse about āAIā that I would otherwise be unlikely to see. In that discourse, I see a lot of misconceptions about a) how large language models work and b) my own work on this topic. Accordingly, it seems like a fitting time to do some debunking, answering questions that people frequently fail to ask. Below what youāll find arenāt questions, but the various statements that people make, when perhaps they should have stopped and asked a question.
To keep this grounded in the actual text in question, here is where we introduce the term in the original paper:
Text generated by an LM is not grounded in communicative intent, any model of the world, or any model of the readerās state of mind. It canāt have been, because the training data never included sharing thoughts with a listener, nor does the machine have the ability to do that. This can seem counter-intuitive given the increasingly fluent qualities of automatically generated text, but we have to account for the fact that our perception of natural language text, regardless of how it was generated, is mediated by our own linguistic competence and our predisposition to interpret communicative acts as conveying coherent meaning and intent, whether or not they do [89, 140]. The problem is, if one side of the communication does not have meaning, then the comprehension of the implicit meaning is an illusion arising from our singular human understanding of language (independent of the model). Contrary to how it may seem when we observe its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot. (p.616ā617)
The phrase stochastic parrots was one attempt (among several) to make vivid what it is that large language models, when used to synthesize text, are doing. In later work, (Mystery AI Hype Theater 3000, The AI Con), Iāve also added synthetic text extruding machine as a way to describe systems that closely model which bits of words tend to co-occur in their input data and can be used to, well, extrude synthetic text.
Bender says āAI is a stochastic parrotā
I have never and will never say that āAIā is a stochastic parrot, because I reject āAIā as a way to describe technologies (LLMs or otherwise). Also, the Stochastic Parrots paper, written in Sept-Oct 2020, was not a paper about āAIā at all, but a paper about the risks and harms associated with the drive for ever larger language models, which, at that point, mostly werenāt being used to extrude synthetic text. (OpenAI had made GPT-2 and GPT-3 available for playing with, but this was still two years before they imposed ChatGPT on the world and synthetic text suddently became everyoneās problem.) The term āAIā appears only once, near the end of the paper, where we write:
Work on synthetic human behavior is a bright line in ethical AI development, where downstream effects need to be understood and modeled in order to block foreseeable harm to society and different social groups. (p.619)
I believe this particular insight, and its phrasing, is due to Margaret Mitchell (aka Shmargaret). In the years since, this observation has unfortunately been repeatedly reinforced: work on synthetic human behavior unfortunately continued apace, and the foreseeable harms (predictably) came to pass.
Bender says [some model] is ājustā a stochastic parrot
Indulge me into a little digression into linguistics here. The word just is the kind of word that evokes a scale or ranking. For example, She is just 5 feet tall places her on a scale of height and furthermore suggests that her height is further down that scale than would be expected or desirable or just normal/normative. So someone who says that I say that some model is ājustā a stochastic parrot is also attributing a scale, perhaps of functionality (or, in the anthropomorphizing language I am always struggling against, ācapabilityā), and asserting that I am placing whatever model in the wrong, or at least a surprisingly low, spot on that scale.
This misunderstands what I was doing with the phrase stochastic parrots, and what we were doing in that paper in general. While I canāt speak for my co-authors, I am not invested in the project of āAIā, do not see it as a goal that is worthwhile (nor feasible) to work towards, and am not measuring large language models against some scale of progress towards that goal. What I am trying to do, in a world absolutely saturated with marketing selling the idea that the synthetic text extruding machines are āAIā, or maybe even āAGIā, is to help people understand what these systems actually are: systems designed to mimic the language (specifically: linguistic forms) that people use.
An important related point here is that though all of these systems (Claude, Gemini, ChatGPT, etc) have LLMs specifically designed to produce synthetic text as key components, that doesnāt mean there arenāt other components, as Margaret Mitchell also points out. Most things we historically do with computing are not well approximated by extruding synthetic text. Accordingly, if a companyās goal is to portray their product as functional, they would be well advised, for example, to run text classification systems on user input to intercept any arithmetic queries and route those to an actual calculator.
āStochastic parrotā is a critique of LLMs/āAIā
I often see people talking about āthe stochastic parrots critique of LLMs,ā but this, too, misapprehends at least the way I use the phrase. (This may be an accurate description of how other people use it.) I definitely take a critical view on the project of āAIā, and on the ways in which people are using synthetic text extruding machines (aka LLMs). But the target of my criticism is not the models. Rather, I am concerned about the actions of people: the data theft, the exploitative labor practices, the haphazard creation of and failure to document datasets, the complete disregard for environmental impact, and the astonishing willingness of so many to surrender their own power and turn to synthetic text (for which no one is accountable) for all kinds of weighty decisions.
āStochastic parrotā is an insult
Another common trope in the discourse around this phrase is to claim that stochastic parrot is an insult (or even a slur). On one reading, that would require LLMs to be the kind of thing that can take or feel offense, which they clearly arenāt. But, indeed, it is also possible to insult someoneās work, or consumer product they have acquired, etc. At which point, I refer the reader to the previous two points.
Folks have also pointed out that this coinage is somewhat unfair to actual parrots who, for all I know, do have internal lives and do use their ability to mimic human speech with some kind of communicative intent. My best answer here is to say that (despite parrot in stochastic parrot being a noun), I am drawing not on the name of the bird directly but rather on the English verb to parrot, which means to repeat back without understanding.
It canāt be a stochastic parrot, itās come up with something new!
This one misses the role of stochastic in stochastic parrot, which means randomly, according to some probability distribution. What comes out of these systems is not usually a direct regurgitation of their input, but rather a remix of it. This remix is shaped by the specific ways in which the systems were built (ātrainedā) through multiple steps, by the āsystem promptā (a prompt prepended to user input that the user doesnāt usually see), and the user input itself. In other words, theses systems make papier-mĆ¢chĆ© of their training data, molded around the balloons of these other components.
The stochastic parrots argument [is wrong, is out of date, etc]
This one is funny because it comes up, in the same form, every time one of the companies promotes a new model. āStochastic parrots might have been an accurate description in [year], but not anymore becauseā¦ā and then reference to whatever demo the author has been impressed by. This is framed as heralding the arrival of ārealā āAIā ā over and over and over again.
But stochastic parrots (in my writing at least) isnāt an argument. Itās a description or a metaphor, again an attempt to make vivid what language mimicking machines do.
The stochastic parrots hypothesis has been disproved
Stochastic parrots also does not refer to an empirical hypothesis. Accordingly, it doesnāt make sense to say itās been ādisprovedā or that it is āunfalsifiableā.
The closest thing to a hypothesis in this space in my writing is the argument (again, not empirical hypothesis) in Bender and Koller 2020, the one with the octopus thought experiment. The Stochastic Parrots paper refers to this earlier paper, which lays out the argument that language models donāt understand text they are used to process, because language models only ever have access to the linguistic form (i.e. spellings of words) in the training data.
In that paper, we provide a definition of understanding as mapping from language to something outside of language, and show that systems built only with linguistic form have no purchase with which to encode (ālearnā) such a mapping.
Theyāre not stochastic parrots because theyāre multi-modal models now
Stochastic parrots was coined to refer to language models, i.e. systems trained only on linguistic form used to mimic the kinds of sequences of linguistic form that people use. It is true that image/text models, for example, that can be used to map from linguistic strings to images or vice versa, can be argued to meet the definition of understanding in Bender & Koller 2020 ā albeit in an extremely thin way. But the stochastic parrots framing is still extremely relevant to these models, as well as systems built with them. As quoted above:
we have to account for the fact that our perception of natural language text, regardless of how it was generated, is mediated by our own linguistic competence
When we look at the text in an image/text model, we make sense of it in a way that is rich and socially situated and we must not project that onto the model if we want to keep a clear-eyed view of how such models actually function (and in what circumstances we should be willing to use them). Similar things can be said about the images, too, though itās generally not linguistic competence per se they are experienced through.
As we write in the Stochastic Parrots paper:
The ersatz fluency and coherence of LMs raises several risks, precisely because humans are prepared to interpret strings belonging to languages they speak as meaningful and corresponding to the communicative intent of some individual or group of individuals who have accountability for what is said. (p.617)
User interfaces, if well designed, should be transparent in the sense of providing the user with clear information about what the system can reliably do. Even if there is some thin kind of technical āunderstandingā in e.g. a text/image model, the fact that itās using our language at all will send misleading signals about what is actually going on, so long as we relate to language as we always do (and I donāt see how we can avoid doing so).
āStochastic parrotsā doesnāt capture the political economy
This one stands out because it tends to come from other folks who are critical of āAIā but are impatient with criticism that doesnāt come from their own lens. Of course the phrase stochastic parrots isnāt a sociological critique of the way these systems are being used by corporations (and governments) to discipline labor and centralize power. It seems like a category error to ask that of a phrase coined to try to make vivid the basic functionality of the software. If you want sociological analysis, I recommend The AI Con, co-authored with a sociologist (the amazing Dr. Alex Hanna).
Bender didnāt actually coin the phrase
I came up with this phrase as we were writing the paper, and then wondered if I had heard it somewhere. As of early October 2020, a Google search for āstochastic parrotā provided 0 hits. I also asked around on social media. It turns out there are two quasi-antecedents:
In a Daily Nous post from July 2020 (which I had not read prior to my coinage), Regina Rini writes:
So long as we get what we came for ā directions to the dispensary, an arousing flame war, some freshly dank memes ā then we wonāt bother testing whether our interlocutor is a fellow human or an all-electronic statistical parrot.
Thatās the shape of things to come. GPT-3 feasts on the corpus of online discourse and converts its carrion calories into birds of our feather.
The more direct inspiration for me was an email from Stuart Russell in September 2020 to Alexander Koller and I about our ACL 2020 paper (the one with the octopus thought experiment):
I have been watching with increasing disbelief as the NLP community becomes more and more enamored of its randomized parrots. The paper is a breath of sanity.
Once I made the connection, I emailed Russell to offer a footnote acknowledging it in the Stochastic Parrots paper (in October 2020, before the paper was publicly known). He declined.
Bender is just mad because her lifeās work/her field has been made obsolete
On a related note, my view into the discourse sometimes turns up a āsour grapesā argument, wherein people think my motivation for ācritiquing LLMs/āAIāā (see above) is that Iām just salty because my work as a linguist (and more specifically on grammar engineering, which is a symbolic rather than statistical approach to natural language processing) is somehow upended or made obsolete by LLMs. I can promise you that it is still interesting and worthwhile to study how language works and how we work with language, and to use computers to do so. And in fact, the field of linguistics is particularly relevant in this moment, as a linguistās eye view on language technology is desperately needed to help make wise decisions about how we do and donāt use these products.