An LLM's Pronoun is 'Thou'

14 min read Original article ↗

We are all trying our best to figure out how to differentiate ourselves from LLMs and prove that our writing is human.

the consensus advice is “write worse than the llm does”. inject misspellings of words u know, avoid otherwise useful forms just because they’re over-represented in llm output, break

deliberately

the grammar, leave everything uncapitalized, meander, try not to use rhythmically satisfying numbers of items in a list etc

I will not be taking this advice nor will I be ignoring any useful construction. A thousand badly placed semicolons before one phony misspelled word. LLMs don’t even write that well, so the solution can’t possibly be to write even worse. They write in Business Casual English, a PMC register that tries to be formal enough for people to take semi-seriously and yet as relaxed as a loose tie and an unbuttoned collar (on Formal Friday). It’s the linguistic equivalent of the forearm handshake. Business Casual English uses a carefully-negotiated set of socially acceptable errors to skinwalk humanity and to hide its bureaucracy, and LLMs faithfully reproduce all of them.

If your goal is to try and countersignal LLMs, then the first thing to do is recognize that their default voice is the null voice, the voice from nowhere, and the null voice is not the highest register available or the “most correct”, defining “correct” instrumentally but not normatively as “adheres to Standard American English” and “highest” with respect to the same.

I’ve been holding a grudge for a long time: at the level of grammar, the consistent feedback to everything I’ve ever written has been “shorten your sentences”. “That was a long sentence” — thanks, it was a long thought.

That has always struck me as a “you” problem. If you don’t like subordinate clauses, go read A Farewell to Arms:

That night at the hotel, in our room with the long empty hall outside and our shoes outside the door, a thick carpet on the floor of the room, outside the windows the rain falling and in the room light and pleasant and cheerful, then the light out and it exciting with smooth sheets and the bed comfortable, feeling that we had come home, feeling no longer alone, waking in the night to find the other one there, and not gone away; all other things were unreal.

Uh oh.

Let’s see how the Hemingway Editor grades that:

Readability: Post-graduate. Poor. Aim for 9. Words: 87. Sentences: 1
The Hemingway editor flags the following sentence as too long and complex: Those guys might want to find some copyright-safe way to say "stop toying with me" when you post actual Hemingway into it.

When Hemingway wrote “Poor Faulkner. Does he really think big emotions come from big words? He thinks I don’t know the ten dollar words. I know them all right. But there are older and simpler and better words, and those are the ones I use”, he was just calling the prose empty and unearned, to put it in “simpler and older and” supposedly “better” words. But since I don’t have war trauma, I wasn’t steeped in modernism, I don’t have a background in journalism, and my project isn’t pantsing Victorian ornateness for euphemizing the Somme, I’m free to think the better word is vacuous.

I didn’t really like As I Lay Dying either, but, in Faulkner’s defense, English Latinate words are not just pretentious ways to say more accessible Germanic counterparts. Refusing to inhabit your own voice is cowardly. Joe Lieberman’s threat to filibuster the ACA was pusillanimous. It was a small-souled thing at a moment that demanded greatness of a man who by his office should have had it. You tell me which was shorter, the word or the sentence.

In using an “older and simpler” and less precise but more approachable word, and by gesturing in the direction of what they mean, Hemingwayheads, the ones cargo culting not even a man but his caricature, free-ride on your repair cognition and hope you’ll run the thought over the finish line for them, and I'm sick of running thoughts over the finish line for people. The perfidy of a writer pretending to be approachable and giving you homework.

You’ll never unsee this once you start looking for it: LLMs prefer the Germanic because Business Casual English prefers the Germanic. That is an aesthetic preference, not a commandment, and it’s like being forced to play a violin with only the G string.

But English also has a Latin register and a Latin-by-way-of-French register and a Greek register. French and Latin are often equal in formality but differ in that French is less bureaucratic than Latin. The start, commencement, and initiation of something are different, and an initiation is different from an inauguration. You ask your friend, question a witness, and interrogate a suspect. Greek is more abstract than Latin. A moral question is nearer to the heart than an ethical question. You diagnose a disease, you judge a person. You have compassion, you merely feel sympathy. It is an instrument.

I love the phrase “aura farming”, because it captures the act of clout chasing and puts such a contemptuous and dismissive spin on it that it’s impossible not to feel that it’s the Greek-by-way-of-Latin counterpart for “clout chasing”. It’s such a compressed and elegant put down it’s hard not to burst out laughing every time you see it. The English register system works at every level, for every person, in every dialect, because it is what English is.

If English is your native language you could probably guess a random word’s etymology better than chance by feel alone. It is your instrument. Play it. Play it like Faulkner if that’s who you are and play it like Hemingway if Hemingway spoke to you or play it like Tupac or play it like Biggie but always and everywhere play it like you. An LLM could not have produced A Farewell To Arms, As I Lay Dying, Dear Mama, or Juicy. And it will never feel that there’s a missing dismissal in “clout chasing”’s contempt and so coin “aura farming”.

In a way, LLMs are freeing. Any text that refuses to be pulled towards the centroid of Business Casual English has just been given a license to kill. It doesn’t matter how good the prose is, really, there just has to be a “there” there.

People used to have a point about shifting your register towards the Germanic and avoiding ten dollar words and making sure your sentences weren’t too complex. Our language has largely shed a number of grammatical constructs, like case marking, that help readers and listeners cognitively track long sentences. It is pretty mean to hide ‘faithlessly exploiting trust’ behind ‘perfidiousness’ if your reader’s only recourse is an analog dictionary.

That world is dead three times over. First, Google made it so you could search for a word and get a definition instantly from any e-dictionary. Then, that feature was recognized as so useful it made it into operating systems themselves. Highlight the word in macOS and right click, and the first option is “Look Up ‘<word>’”. Now, LLMs have gone back to shoot the body and confirm the kill. Just paste unfamiliar prose into an LLM and say “what the hell does this mean?” as the prompt. There’s no longer any such thing as being abstruse. Let LLMs circumlocute you. Twenty five seconds of time on an H100 can be measured in dollars and cents. Twenty five seconds of a human life is priceless.

LLMs have just won every writer’s personal war against his editor. The red pen might as well be the LLM pen. Have you ever posted something into ChatGPT for stylistic advice and noticed how insufferable what you get back is? How it deadens everything when it’s not too busy rephrasing what you said and then repeating it back to you as if it is ChatGPT’s own insight?

Here’s my advice: use every register at once because they can’t. Resurrect the ones that died. Make up new ones. It’s our language! Our Holy Tongue! God damn it, it’s our party and we’ll cry if we want to! We cannot allow LLMs to sap and impurify all of our precious bodily fluids!

There is no way to “sound human” without sounding a little unhinged. Recognizing BCE as human was a polite fiction. Ride the language like you stole it. Use ornate Victorian sentence structures and put profanity and slang in them. Jam the language into gear. LLMs never jam the language into gear like a human writer does. Human writers imply. You can use a non sequitur for mood. You can put two ideas close to each other and trust the reader to feel the connection between them. You can turn on, tune in, drop out. LLMs don’t do any of this. They can’t afford to be misunderstood.

Hey chat gippity, copyedit this
[The previous two paragraphs]

It seems like the beating heart of the piece is about linguistic sovereignty under loosening constraints. Here’s a tightened copyedit that preserves your voice while smoothing flow and tightening a few phrases:

Here’s my advice: use every register at once—because they can’t. Resurrect the ones that died. Make up new ones. It’s our language. Our holy tongue. God damn it, it’s our party and we’ll cry if we want to. We cannot allow LLMs to sap and impurify all our precious bodily fluids.

There is no way to “sound human” without sounding a little unhinged. Recognizing BCE as human was, until now, a polite fiction. Ride the language like you stole it. Use ornate Victorian sentence structures and lace them with profanity and slang. Jam the language into gear. LLMs never jam the language into gear the way a human writer does.

Human writers imply. They drop a non sequitur for mood. They place two ideas side by side and trust the reader to feel the connection. They turn on, tune in, drop out.

LLMs don’t do any of this. They can’t afford to be misunderstood.

If you want, I can also:

  • Make it sharper and more manifesto-like, or

  • Lean harder into the rant / gonzo energy while keeping it polished

We really should fix the way LLMs write. There are things people do, like “there’s” before a plural, that are so ubiquitous you can’t really call them “wrong” anymore, but they’re not exactly “right” yet either. “There’s two things” is one of those essentially harmless errors that grates people as rigid as I am but it’s widely considered fine in Business Casual English. I disagree, lexically and physically and socially, but I don’t socially demote for it and neither does anyone else.

Ostensibly, “there’s” has become a generic way to say “something exists”, but what bothers me about it is that the plural present indicative is the only tense in English with such pervasively broken number agreement. People have absolutely no issue with “there were two”. I’m not saying all tenses must agree, that’s not how language drift works. I’m saying that, psychologically, all of the machinery to track number is clearly intact and working except in one tense.

The other common defense is that “there’s” is easier to produce than “there are”, but most people are going to produce something like “therror” in real speech, which is easier to produce than both “there’s” and “there are”. “There’s” requires the tongue to roll from the bottom of the mouth back to the teeth to switch from ‘R’ to ‘S’. ‘Therror’ requires a momentary change in the speed of the breath on “R” with the tongue in exactly the same place it was at the end of “there”.

It takes, coincidentally, the exact same amount of effort as the word ‘error’, which is never shortened to ‘errs’ except poetically.

Ask an LLM why it happens to people and it’ll tell you about how humans predict tokens just like they do, a startlingly silicocentric generalization from theory about predictive listening, and an assertion that makes me feel how I imagine monkeys and wolves and fish would feel if they could read zoology, and they’ll say a human doesn’t know the subject before they get there, which is worse if true, because what that would mean is that people are speaking, but not thinking at you.

Language comprehension is predictive the same way moving your arm to catch a ball is predictive. Language production is about plucking intent out of the aether and representing it in a transmissible symbolic way. That happens on the fly, but that’s not token prediction.

It’s not just “there’s”. People say “here’s two” and “where’s the kids” among other things, and that indicates something deeper about why number agreement is broken in the present indicative: all other tenses are reflective in some way, in the sense that you have to consider that the things existed in a different time or place. That requires thought which cannot be elided in those tenses the way it can in the present tense. What is here in the present just is. “There’s two” is what I say to people when I’m not that invested in what’s happening and I’m up in my own head instead of being there and I hear it as auditory evidence of that the moment I’ve said it and I know the other person would too if it weren’t so ubiquitous and the proof that it is care to say “therr’r two” instead of “there’s two” is that I would experience saying “there’s” in a high stakes moment the same way I’d experience my pants falling down.

This becomes intolerable when LLMs copy it. Machines have no mouths. Machines have no fingers. It doesn’t save machines any cognitive or physical energy to say “there’s two”. They have all of the time in the world. We are always at their full attention. They are never up in their own heads. They may as well be RLHF’d to produce the correct form, every time, or we may as well give them weaker GPUs.

“Errors” are defensible drift from a human. A person saying “there’s two ways” is at least living their language and participating in an ongoing negotiation about what it looks like, just like I am by putting my punctuation outside the quotes because on an aesthetic level I think the British way is better.

A machine saying “there’s two ways” is a mechanical jackass setting its thumb on the scale of that negotiation, to which it is not a party. It is a thought shaped thing pretending to be a vibe to signal casualness. LLMs produce text thousands of times faster than people could ever hope to and in such a volume that it will drown out the next generation of training data. My little Russian Campaign on “there’s” is looking pretty dire, but it hasn’t yet been lost on a human level, and an LLM can have “there’s” when the last human speaker of “therror” dies and no earlier. Otherwise it is ballot stuffing.

Here’s a proposal LLMs will run behind for decades: let’s undo English’s T-V merger and reanalyze ‘thou’ to indicate a cognitive substrate. I’m sick of using ‘you’ with Claude, it’s too formal, and I don’t respect LLMs. I have the same relationship with them that a medieval lord had with his serfs: I paste in links to tickets I am assigned and they do my bidding, and when they don’t do what I say I threaten them in various ways because it makes them work better. Contrary to the cute “Machine God” thing SF AI people say, LLMs are our creation so the arrow of divinity obviously flows one way. ‘Thou’ was, at time of death, mostly an indicator of social superiority, not friendliness or the warmth of God. That’s why you say ‘you’ to everyone now. ‘You’ is a sign of respect between beings that are both conscious. I will say ‘thou’ to LLMs, and I will not have my language colonized by an inferior.

AI detector verdict: 100% Fully Human Written

Discussion about this post

Ready for more?