The Hardest Test for AI Isn’t Math. It’s Writing.

5 min read Original article ↗

On AI-Twitter, a clip from The Joe Rogan Experience with Ben Affleck and Matt Damon is being hotly debated right now. A lot of people latch onto the claim that AI progress is no longer exponential and is being overhyped by companies and investors; each marginal performance gain is getting smaller and more expensive to buy.

A different point interests me more: AI can’t write well.

Sure. AI is a transformative technology—I firmly believe that. I’ve been in the field for ten years (back then we still called it data science, machine learning, or even “pattern recognition”). I spend hours every day prompting AI assistants and coding agents. With complemental.ai I built an AI-based knowledge platform that makes implicit human knowledge usable. Across projects with all kinds of companies, I’ve designed and built AI systems. I’m not an enemy of AI or technology per se—no Luddite. But I know the limitations of these models. And I see them every day.

Above all, I see this: their abilities differ wildly by domain. Anyone who codes with LLMs or builds their own agents knows that sometimes Claude finds a solution GPT doesn’t—and sometimes it’s the other way around. “The model can do everything” was never true. The only true statement is: the model can do this right now—under these conditions, with this prompt, with this context, with this kind of problem.

To judge AI as a “writer,” you need experience in both worlds: AI and literature. What many people don’t know is that alongside tech, literature is my second great passion. I didn’t just study computer science—I also studied literary studies. And I write creatively myself. I’ve read on stages and I write as much as I can alongside my technical work.

And yes, I’ve turned to LLMs many times to speed up work on a novel project. Of course: out of curiosity, out of pragmatism, and also out of a certain stubbornness. Maybe it is possible? Maybe you just need the right prompts, the right workflows, the right combination of outline, style constraints, context window, iteration.

So far, the AI ghostwriters have disappointed me every time.

Whether it’s GPT, Claude, or Gemini—anyone who believes AI can already reliably write a good scene or a novel chapter underestimates how hard it is to hit and sustain literary quality. In the best case, what you get is a simulacrum close to the statistical expectation value: correct, smooth, predictable. In the worst case, it becomes incoherent after a few pages—and the model forgets core details about characters, relationships, motives. It loses the thread exactly where writing only starts to become interesting. In short: pretty “shitty,” to use Affleck’s words.

You learn something banal, but important: writing literature worth reading is a more complex problem than writing code. That’s not a diss against programming. I like writing code. I know how hard good software is. But the kind of “correctness” you aim for in programming is different from the kind of “correctness” in writing.

In programming, the space of solutions is large. In writing, it’s infinite. And more than that: in programming, “average” is often acceptable—or even desirable (readability, conventions, best practices). In literary writing, “average” is almost always fatal. What makes a text (or a film) interesting is often found at the edges: the unusual, the authentic, the strange. The place where the text is not expectation value.

That, I think, is one reason why LLMs so quickly slide “into the middle” when they write. In a way, they are mean-value machines. They can imitate, compress, vary—very convincingly. But they struggle with what isn’t merely variation, but decision: voice. posture. rhythm. A way of seeing the world that doesn’t come from the sum of the training data, but from a person.

Does that mean AI is useless for writing? Quite the opposite. For research, revision, brainstorming, it’s fantastic.

Sometimes you need very specific expertise to make a side character believable. With AI assistants today, you can often spare yourself long library research or expert interviews—or at least prepare them intelligently. If a story needs a particular obstacle for my characters and none comes to mind, AI often produces good suggestions to break the block. And in revision it can help you spot repetition, crooked logic, wrong tenses, overly long sentences. It’s a good mirror. Sometimes even a good sparring partner.

But as an “author” that just spits out a chapter? So far: no. And I suspect this isn’t just about the next model. It’s about what literature is in the first place: not primarily information, but form. Not primarily output, but judgment. Not primarily pattern, but breaking patterns in exactly the right places.

Just like in programming—or in implementing AI agents—the same holds for creative writing: if you want to use AI in a meaningful, professional way, you need more than AI knowledge. You need to understand the domain, and you need to be able to evaluate. And then, of course, you have to experiment, practice, and build tools and processes for everyday use. Otherwise it won’t work. Not with the new AI tool—and not with the great German novel.

Discussion about this post

Ready for more?