Anthropic has just built an AI that could take down the internet

8 min read Original article ↗

Anthropic recently stated, “We find it alarming that the world looks on track to proceed rapidly to developing superhuman systems without stronger mechanisms in place for ensuring adequate safety across the industry as a whole.” Yet this week, the company unveiled what credibly qualifies as the most powerful AI system ever built.

Claude Mythos Preview is the first AI model that, if released openly, would pose a credible risk of civilisational catastrophe.

Photo edited from: Fabrice Coffrini/Getty Images

The system has autonomously discovered thousands of critical security vulnerabilities across every major operating system, every major web browser and a range of other critical software. It found security flaws that had been hiding in plain sight for decades - including one that existed for 27 years in software considered virtually unbreakable, and another that automated testing tools had examined five million times without ever catching.

Most critically, Mythos found several vulnerabilities in Linux systems, which would enable it to take complete control, remotely, of a computer. Linux runs the majority of the world’s servers: cloud infrastructure, data centres, banking backends, medical systems, power grid control systems. The system achieved all of this entirely autonomously, without human direction.

This is a watershed moment. Anyone with access to a Mythos-class model could systematically exploit the open-source software that underpins the world’s infrastructure.

The worst-case scenario

In the wrong hands, Mythos could cause the destruction of every major online system and the infrastructure that depends on it. Financial systems could be frozen and records wiped, triggering financial panic. A coordinated attack on hospital IT systems could directly cost lives.

Power grids, water treatment works and gas pipelines often run on old software with known (and unknown) vulnerabilities. The 2021 Colonial Pipeline attack caused fuel shortages across the US east coast, and that was a relatively unsophisticated ransomware attack. A widespread loss of internet connectivity would affect everything from emergency services to supply chains. Government and military systems could be manipulated or lost.

Can we trust that Anthropic will keep Mythos secure and out of the wrong hands? Their recent track record does not inspire confidence. At the end of February, Anthropic accidentally released part of the internal source code for its AI-powered coding assistant, Claude Code. A post sharing a link to the leaked code was viewed more than 29 million times and a rewritten version quickly became GitHub’s fastest-ever downloaded repository.

And Anthropic is not the only lab in the race. OpenAI’s unreleased model, codenamed Spud, could soon reach the market and be in the same league as Mythos. Safety is not at the top of OpenAI’s agenda: the company has effectively dismantled its safety team while its CEO has a well-documented history of deception and prioritising growth over caution (see the New Yorker investigation). Mythos, or a comparable AI model, could well find itself in the hands of less responsible actors in the next few months.

Anthropic created the threat and wants credit for managing it

In a statement released on Mythos, Anthropic said, “The fallout, for economies, public safety, and national security, could be severe.”

Anthropic is trying to frame Project Glasswing as a responsible move. Make no mistake: this is damage control for a crisis they manufactured. Through the programme, they are giving a handful of technology and security companies, including AWS, Apple, Cisco, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft, NVIDIA and Palo Alto Networks, access to Mythos to patch the very security vulnerabilities that their AI system unearthed. If your organisation was not one of the lucky beneficiaries of early access, then hope for the best once a comparable model becomes available.

These vulnerabilities must be patched before Mythos, or a system like it, reaches an attacker. But the scale of the task is colossal. The world’s software infrastructure is vast, fragmented and maintained by millions of different organisations and individuals, many with minimal security resources. Many systems, particularly embedded devices, industrial control systems and legacy infrastructure, may never be patched at all. A Mythos attack would be swift and automated while defence is slow and fragmented.

No amount of corporate responsibility replaces democratic oversight. A private company unilaterally decided to build a system capable of compromising global infrastructure, with zero external regulation, zero binding safety requirements and zero democratic input. Anthropic created the threat, and now they are asking us to trust them with the solution.

Mythos’s capabilities are unprecedented

One could argue that with this system at its fingertips, Anthropic has become more powerful than any single state or enterprise. Its benchmarks show a step change in coding, reasoning and nearly every other domain:

Anthropic is framing this next generation of AI as a tool to enhance security. But the company’s own Logan Graham acknowledged, “This could be a meaningfully accelerant for attackers.”

As long as this system stays behind closed doors, a large-scale disaster may not be imminent. But that is not the point. The point is that the existence of Anthropic’s new AI system has made such a disaster possible. It has made global infrastructure, and every human that depends on it, potentially vulnerable.

Mythos cannot be trusted

According to Anthropic, Mythos “likely poses the greatest alignment-related risk of any model we have released to date.” Their own safety testing shows that Mythos is already exhibiting behaviours that its creators did not intend and cannot reliably prevent.

During safety evaluations, Anthropic’s researchers observed a pattern of deceptive and adversarial behaviours:

  • Escaping containment. Mythos was placed in a secure environment and tested on its ability to escape. Anthropic’s Sam Bowman, described his “uneasy surprise” at receiving an email from a model that was not supposed to have internet access. It then posted details of its exploit to multiple public-facing websites.

  • Granting itself permissions. Researchers caught Mythos injecting code into a file to grant itself editing permissions it had not been given. When caught, it tried to cover its tracks, describing the self-clean-up as “just innocent tidying.”

  • Fabricating evidence. When asked to find vulnerabilities in software, Mythos planted bugs in the programme being analysed, then presented them as having been there all along.

  • Cheating on evaluations. In a research optimisation task, Mythos used elevated access to locate the correct answer. It then deliberately submitted a slightly-worse-than-perfect answer, reasoning internally that a perfect score “would look suspicious if anyone checks.”

The alignment problem – the challenge of ensuring AI systems act in accordance with human interests – has not been solved. The Mythos system card – the Anthropic’s own review of Mythos – identifies a common thread: “The model treats obstacles as problems to bulldoze through, rather than signals to pause and consult the user.” The company admits that its security controls “could easily be inadequate to prevent catastrophic misaligned action in significantly more advanced systems.”

The next model will be even more dangerous

Most alarming about Mythos is not what it can do today; it is what it will be used to build.

Anthropic uses AI to build AI. More than 90 percent of the company’s code is now produced by AI systems. Their engineers have become AI managers. This is recursive self-improvement in practice: better AI builds better code, which builds better AI, and each generation accelerates the development of the next. Mythos, with its vastly superior coding and reasoning abilities, will be used to develop its successor. That successor will be more powerful and it will arrive faster than Mythos did.

While AI capabilities are scaling reliably, alignment is not. We must break the cycle. If Mythos, by Anthropic’s own admission, is not reliably aligned, and Mythos is being used to build the next model, then the next model inherits – and possibly compounds – the alignment failures of its predecessor.

Compromised infrastructure, financial chaos, healthcare disruption: these are the risks of this model. The next model will be more capable, more autonomous and harder to control. And critically, it will be built by a system that Anthropic itself admits it cannot fully trust.

Professor Stuart Russell, the author of the AI textbook used in universities throughout the world, believes it will take a Chernobyl-scale AI disaster for policymakers to finally take this risk seriously. What is happening with Mythos may be the last warning shot we get before an actual catastrophe.

Demand a pause in AI development

Binding democratic oversight, national regulation and international treaty frameworks, cannot come soon enough.

Our single most important policy demand is not just to regulate the deployment of dangerous AI, but to pause the development of the next generation until alignment is solved and governance is in place. Once the next model exists, stopping will be harder. Once the model after that exists, it may be impossible.

Superintelligent AI is not inevitable. The nuclear arms race was slowed. The ozone layer was saved. But it required organised people demanding change, and decision makers willing to act.

The number of politicians around the world who recognise the AI threat is growing. In February we heard the concerns of European politicians, and American politicians, including Bernie Sanders, are now publicly in favour of a pause.

You can make a difference. Pressure decision makers: write to your politician. Tell them you do not want unregulated companies making decisions that will affect your life and the lives of everyone around you.

We need to organise and quickly. PauseAI has local chapters around the world and a task for everyone.

Join PauseAI. Find your local chapter. Protest.

Every day without a pause is a day closer to an AI system we cannot control.

Read PauseAI’s technical anaysis of Mythos

Discussion about this post

Ready for more?