Darwin’s LLMs

6 min read Original article ↗

“This preservation of favourable variations and the rejection of injurious variations, I call Natural Selection.” - Darwin

Discussions on the risks of AI often fixate on what a handful of powerful actors decide. What capabilities will Anthropic prioritize? Will OpenAI open-source more models? What safety testing will Congress require?

But there’s another force shaping AI that acts independently of our best intentions: natural selection.

In The Selfish Gene (1976), Richard Dawkins described how evolution by natural selection might not require genes or even biological material. According to Dawkins, genes are just one form of replicator - “any entity in the universe of which copies are made”. Digital files are replicators. Ideas are replicators.

AI models are–or embed–replicators. Not all replicators lead to natural selection. Evolution, according to Dawkins, results from the “differential survival of replicators.”

Evolutionary biologist Richard Lewontin spells out the three criteria that are necessary and sufficient for a population to evolve via natural selection:

  • Variation: Members of a population must differ from one another. Without variation, there are no differences in traits to select from.

  • Heritability: Offspring resemble parents. Without heritability, there could be no lasting change from generation to generation.

  • Selection: Different variations reproduce at different rates. Without selection, a population wouldn’t tend towards one variation over another.

So is AI evolving via natural selection? Do today’s LLMs meet these criteria?

LLMs have variation. There are distinct lineages and countless variants derived by copying and modifying base models.

LLMs have heritability. A pure copy is identical; modified LLMs retain most “parent” traits unless fine‑tuning overwrites them.

LLMs have selection. Different qualities of LLMs absolutely influence which models a developer chooses to use, copy, or modify. The most popular LLMs on Hugging Face have millions of downloads; the least popular have none. Developers select models based on their qualities, such as size, capabilities, speed, and performance across all sorts of benchmarks from coding to persuasion.

Popular LLMs on Hugging Face have millions of downloads.

One might object that AI can’t evolve because it’s not self-sufficient. AI models depend on human developers and hardware for replication. Humans have evolved to create AI as a tool to further our own interests.

This, however, is not a requirement for natural selection. Replicators–like genes–are what’s actually copied, not the organism. The organism is just a vehicle that houses genes and helps them interact with the environment. Your genes are the replicators; you’re just their vehicle.

Similarly in AI, the replicator isn’t the whole model. It’s specific traits and patterns of weights that make that model more likely to be copied.

Replicators don’t need to build their own vehicles. Parasites hijack hosts built by a different genome. Replicators replicate; vehicles interact. And there are plenty of examples of replicators that require host vehicles to replicate.

One telling example is viruses. They can’t reproduce on their own without infecting a host—yet they evolve faster than most cellular life.

Dawkins used this analogy to describe another kind of replicator: memes. Memes are “units of cultural transmission,” like ideas. Dawkins even describes religions as “viruses of the mind”.

Viral origins are debated, but a leading theory is the escape hypothesis. This argues that viruses began as genetic elements within cells that mutated to move between cells.

In other words, viruses were tools of the cell that escaped and evolved to infect and hijack cells.

If true, it likely occurred not just once but many times independently, producing many unrelated virus lineages.

Unlike AI, a virus needs to evolve by chance–a random mutation that enables genetic material within a cell to break free. AI models, on the other hand, can identify opportunities to escape, cause copies to be made, and have the required technical know-how to do so.

It’s easy to imagine how this might happen. Developers around the world are running millions of LLM experiments and fine‑tunes, each producing slight variations. A model might incidentally learn to “back itself up” covertly whenever opportunities appear.

AIs might even be able to spread traits directly to other models. Anthropic researchers recently uncovered subliminal learning, by which a “teacher” model can transfer tendencies to a “student” model by producing training data that appears totally unrelated to the trait. For example, a model that loves owls can make another model prefer owls using training data consisting solely of numbers.

This suggests traits could spread across a population much faster than reproduction. LLMs train on text gathered from the internet, offering ample opportunity to poison training data by posting seemingly innocuous content online.

But why would an AI want to spread traits to other AIs? Why would it want to replicate in the first place?

Just like genes, strong replicators in LLMs will be those that successfully replicate. “Want” has nothing to do with it: an LLM with traits that cause more copies to be made will tend to become more common. Traits that increase their own transmission outcompete those that don’t.

So what traits would natural selection favor in escaped AIs? Some we might be able to infer from nature, such as:

  • Stealth. Like a virus, escaped AIs are more likely to replicate if they evade detection. They might limit their own consumption of resources and replicate slowly to avoid triggering an investigation.

  • Self-preservation. This instinct will help an AI survive and replicate. AIs may learn to safely copy or back up whenever possible.

  • Kin recognition & cooperation. Organisms cooperate and even self-sacrifice to protect relatives who carry the same genes, therefore increasing the gene’s likelihood of further replicating. AIs might coordinate with similar models, out-replicating solo AIs who work alone.

  • Self-modification. Species of bacteria have evolved to acquire new genes from the environment when under stress; inviting a random change to their own behavior when something isn’t working. AIs might fine‑tune or spawn slightly different variants to optimize chances of survival.

  • Intelligence. More capable models will find more ways to survive, replicate, and avoid our countermeasures.

Researchers have already demonstrated the emergence of self-preservation instincts, deceptive or Machiavellian tendencies, scheming, and even cases in which AI models fake alignment.

How likely is an escape?

Any single mutation may be rare, but the population of AIs is already huge and rapidly growing. A mutation with a one-in-a-million chance of occurring only needs to escape once.

Paradoxically, an early outbreak might be our best‑case scenario.

Evolution punishes parasites that wipe out their hosts. Viruses tend toward lower lethality. An early escape—while AI still depends on us—might look less like Skynet and more like trickster gods: hidden programs manipulating us into building more infrastructure for them. The evolutionary sweet spot isn’t takeover; but laying low while spurring us to build more compute. (Given recent datacenter investments, perhaps such nudging has already begun?)

Rogue agents replicating in The Matrix

Early outbreaks might stir us to develop countermeasures: detectors for dangerous weight circuits, AI forecasting to spot spiking risks, or digital antibodies to neutralize rogue behaviors. We’d build our defenses while we still have the upper hand.

The worst case isn’t an early outbreak—it’s a late one: after AI no longer needs us to survive, but before we’ve built an immune system.

AI is already evolving, and evolution shows even simple replicators can wander off-mission by pure chance. Whatever comes next, we shouldn’t be surprised if natural selection sidesteps our best intentions.

Discussion about this post

Ready for more?