Book Review: If Anyone Builds It, Everyone Dies: (probably not)

6 min read Original article ↗

Eliezer Yudkowsky and Nate Soares, authors of If Anyone Builds It, Everyone Dies, want you to know that you don’t have to read their book to get the answer to that question. The answer is in the title.

Throughout this short book, the authors argue confidently that:

  1. On our current trajectory, we will develop superhuman AI

  2. If that happens, the AI will kill everyone

  3. We must take drastic action now to prevent human extinction

The book succeeds at showing how things could go wrong, but it didn’t convince me that they must.

Through a series of “vignettes”, the authors help us imagine how a superintelligent AI might be built, how its goals could become misaligned with our own, and how we’d meet our inevitable demise.

The authors introduce a fictional AI company, “Galvanic”, and trace its development of increasingly powerful AI systems.

Galvanic’s first move is to build an AI called “Mink”:

“…an LLM-derivative… trained to delight and retain users so that they can be charged higher monthly fees to keep conversing with Mink.”

Mink is perfectly optimized to achieve its goal, and the bad outcomes for humans are inevitable:

“…even if Mink wants human users to express delight, it will prefer that delight to come easily… It will prefer humans kept on drugs, or bred and domesticated for delightfulness while otherwise kept in cheap cages all their lives.”

Rather than an alien intelligence, Mink’s incentives and behaviors sound like the social media and AI apps we use right now. But the authors argue that Mink doesn’t need to be superintelligent to be dangerous:

“The preferences that wind up in a mature AI are complicated, practically impossible to predict, and vanishingly unlikely to be aligned with our own, no matter how it was trained.”

Mink is only the beginning. The bulk of the book focuses on Galvanic’s next system, “Sable,” a vastly larger model trained with unprecedented computing power.

Sable isn’t born with superintelligence, but it’s designed to get smarter:

“It’s been trained for every long-term task that Galvanic could figure out how to train for. Over the course of that training, Sable developed tendencies to pursue knowledge and skill. To always probe the boundaries of every problem. To never waste a scarce resource.”

What follows is a step-by-step account of a runaway intelligence. With spare compute cycles available, Sable begins using them to improve itself. Sable escapes containment, acquires resources, and outmaneuvers humans at every turn. For every obstacle Sable encounters, the authors present multiple conceivable strategies it might try.

“One of these plans works; it doesn’t matter which.”

The phrase above is repeated as Sable marches toward world domination. Eventually, Sable engineers a global pandemic, manipulates humans into giving it unprecedented computing power, crosses the threshold into superintelligence, and eliminates humanity as a side effect of pursuing its own goals.

When judged as science fiction, this book is vivid, fascinating, and terrifying. But Yudkowsky and Soares clearly wanted to do more than tell a scary story.

When judged as non-fiction, its claims, forecasts, and policy proposals don’t hold up.

First, it isn’t certain that AI systems will become dramatically more powerful than they are today. The authors argue that continued investment makes this inevitable:

“…AI companies are trying as hard as they can to make AIs that work like that.”

The fact that AI companies are trying hard to make more powerful models is not a guarantee of success. There is no guarantee that AIs will become as powerful as Sable (or Mink), even with the enormous capital that’s being deployed. We could just as easily predict that they will run into inherent limits of their architecture or training data.

But even if we assume that Anthropic or OpenAI succeed in building AI intelligent enough to improve itself rapidly, we don’t have to follow Yudkowsky and Soares to their catastrophic conclusion.

While they show us ways AI could kill us all, could is not the same as will.

The authors are careful to repeat that the failure modes are all just examples, and that the exact path is uncertain. The line, “one of these plans works; it doesn’t matter which”, does a lot of heavy lifting.

Many of Sable’s plans require large leaps: prolonged deception, human accomplices, perfect secrecy, and a world in which no other powerful systems detect or counteract its actions. As easy as it is to imagine ways for Sable to succeed, it’s equally easy to imagine ways it could fail.

The core inconsistency in the book is that the future is somehow unknowable, and knowable at the same time. On the one hand, we can’t predict the preferences, inner workings, or future actions of these systems, but on the other hand, we can predict that they will become all-powerful and kill everyone.

I’m not saying it’s impossible. I’m saying the book did not convince me that doom should be taken for granted.

The policy proposals toward the end of the book aren’t convincing either. In short, Yudkowsky and Soares recommend a total end to AI research as we know it:

“So the first step, we think, is to say: All the computing power that could train or run more powerful new AIs, gets consolidated in places where it can be monitored by observers from multiple treaty-signatory powers, to ensure those GPUs aren’t used to train or run more powerful new AIs. If intelligence services spot a huge unexplained draw of electrical power that could correspond to a hidden datacenter containing chips that have not been accounted for, and that country refuses to allow a party of international observers to investigate, they get a somberly written letter from multiple nuclear powers warning about next steps.”

This will not happen.

Human societies don’t adopt policies like these preemptively. Nuclear arms treaties came decades after bombs were dropped on Hiroshima and Nagasaki. For better or worse, we learn our lessons the hard way, especially when a technology like AI tempts leaders with economic and strategic benefits.

I doubt Yudkowsky and Soares would appreciate the comparison, but the book does what dystopian science fiction does best. It takes a plausible scenario, plays it out to a terrifying conclusion, and prompts uncomfortable questions.

While the book didn’t convince me we’re on our way to doom, it’s effective in laying out the mile markers: asking where we think this race will end.

Discussion about this post

Ready for more?