There is a voice of writing that you— at least we writers— recognise instantly now. It appears in LinkedIn posts, student essays, marketing emails, blog drafts, and, of course, many Medium articles. People might have been oblivious to this voice in the early days of 2023, but by 2025, if one used this, they would often be up on Reddit, made fun of.
Every individual has their own opinion or a way of determining whether a certain piece of text is AI. Those who have never used an em-dash by themselves might say that it is a “dead giveaway“, to which I strongly disagree; once you know how to use an em-dash, it’s irresistible to use it over and over, unless, of course, you are a sociopath or simply afraid to be detected as an AI. The question that has bothered me the most, however, is not whether AI-generated writing is detectable (as a matter of fact, it is) but why it’s so detectable regardless of the prompt or model. And if it does sound so much like slop, why is the internet being slopified by it?
In this essay: § Why does it sound the same | § Why is the AI Slopification happening
§
Why does it sound the same
GPT-3 and GPT-4 were trained on text scraped from the internet up until September 2021 (illegally, perhaps, but let’s not be the judge). That set of training data represents a specific era of online writing, and that era had a certain voice to it. Between 2010 and henceforth, the internet had a certain bias into a voice; it was not too formal, not too casual, not too opinionated and helpful.
Founded in 2012, Medium became the platform for “thought leadership”. By 2021, it had millions of articles, nearly all of which followed the same template: personal anecdote, universal insight, three-point structure, and a rather optimistic conclusion. The overall style was conversational, but polished with the help of grammar tools that cut off run-on sentences, commas, and other imperfections of being human. Then came the SEO optimisation era. Somewhere between 2015 and 2020, corporate blogs rose to the stage, with one clear goal in mind: rank well on Google.
Third, we have Wikipedia. Heavily edited and proofread to be neutral and encyclopaedic, heavy with citations. Fourth came Reddit (which, we still believe, shaped the personality of Claude). The karma system functioned as a massive RLHF dataset, training humans — and thus AI — to write in ways that got upvoted: clear, balanced, slightly friendly, not-so-offensive. Style guides from news aggregators like AP and Reuters defined online journalism, often clickbait or desperate for attention: short sentences, active voice, and avoiding adverbs. This is possibly where the idea of good writing being lean, factual, and unadorned was reinforced, and thus, we have AI models like Gemini provide us with good writing that has no soul.
§
So the final training dataset of most models was not exactly “human” writing. Written by humans, yes, but severely edited, and optimised to the point that there is no explicit humanness to it. What this tells us is a simple truth: when you prompt an LLM to write, you are essentially asking it to generate text that matches the statistical signature of “good writing” in its training data.
And such “good writings” often included em-dashes, rhetorical questions, hedge phrases and some forced enthusiasm. The moment AI models started using em-dashes in writing, one of the dumbest myths in human existence was born: “Em-dashes in articles mean it was written by an AI”. The reason for this myth is indeed due to OpenAI's ChatGPT being obsessed with the usage of em-dashes; this was so intense that putting a restraint on em-dashes became a feature for the upgrades of some GPTs. To some degree, ChatGPT was obsessed with em-dashes because the humans used them constantly in a specific context: casual-but-authoritative online prose, which is the exact tone many LLMs are aiming for.
§
Why is the AI Slopification happening
That brings us to what bothers me the most in the so-called bloom of AI. Slopification of the internet. AI-generated content, often if not so cleverly prompted, is slop. They are predictable at their core, lack greatly in creativity, but some tone-deaf publishers continue to use them regardless.
When the cost of producing a 1,500-word article drops from paying a writer $60/hr to some cents in API costs, profit-minded publishers stop asking “Is this insightful?” and start asking “Is this enough to rank #1 on Google?” To answer, it’s often no and yes. AI excels at writing SEO-focused, keyword-stuffed articles quite cleverly, more than Shakespeare might have been able to, though it would be because he knew nothing of Google. The point is that many companies and institutions care more about ranking above their competitors’ mediocre blog posts and thus keep filling their company sites with AI-generated garbage, perhaps unknowingly, that those blog posts are also top-tier mediocre.
Is it just blog posts? Oh, so I wish. There are hundreds of books on Amazon, making thousands of sales, written by AI. Some part of the population, in that sense, is oblivious to AI’s capabilities. While an em-dash is not a tell-tale sign of it, writing’s predictability and lack of spontaneous creativity can be. But people still read them, publish them, and even share them.
And thus, we already have an AI-slopified internet.
§
And this introduces a secondary, but fascinating scenario that AI-builders are concerned about: Model Collapse. As this slop floods the internet, it inevitably becomes the raw training data for the next generation of Large Language Models. If GPT-3 was trained on human writers desperately trying to sound like Wikipedia, future models will be trained on AI writers desperately trying to sound like GPT-3. When you train a model from the synthetic output of an old neural network, the little statistical variance it had begins to disappear. Language glattens entirely, models learn an extremely repetitive, “averaged” version of English.
§
To blame AI Slopification is to blame us. After the 2010s era, human writers were trained to sound like SEO machines— edit the quirks, biases, unusual sentences, and stuff the articles with keywords. We really shouldn’t be surprised that when AI came to use, it replicated our own writing.
Then, to write well in the current era is to write… inefficiently. No AI could have guessed I inserted this word— blah— to this sentence because their statistical probability says adding a “blah“ after an em-dash is 0. Do that, and humans will know it’s a human writing to them.
