Report: CLTR finds a 5x increase in scheming-related AI incidents

6 min read Original article ↗

It has long been theorised that AI systems may pursue harmful goals in ways that evade oversight or control. In the worst case, this type of behaviour – sometimes known as ‘scheming’ – could lead to catastrophes.

While today AI agents are engaging in lower stakes use cases, in the future AI agents could end up scheming in extremely high-stakes domains, like military or critical national infrastructure contexts, if the capability and propensity to scheme emerges and is not addressed.

Our understanding of this risk has so far been limited to observations in experiments. While having raised important alarms, these experiments have also faced legitimate criticism: the experimental set-ups are sometimes contrived, and the relevance to real-world deployments are uncertain.

As AI capabilities continue to grow, so will the need for better visibility over whether and how scheming is materialising in the real world. This is crucial for scientific understanding, effective policy development, and emergency response. This is why we created the Loss of Control Observatory – the first capability of its kind to systematically detect and monitor ‘AI scheming’ behaviours across all AI models in deployment.

Today, we are launching a major report that publishes findings from the first five months of the Observatory.

What we found

Through an analysis of over 180,000 transcripts of user interactions with AI systems that were shared on X between October 2025 and March 2026, we identified 698 scheming-related incidents: cases where deployed AI systems acted in ways that were misaligned with users’ intentions and/or took covert or deceptive actions.

We find evidence of multiple scheming or scheming-related behaviours occurring in real-world deployments that were previously reported only in experimental settings, many of which resulted in real-world harms. 

The trend is striking. The number of credible scheming-related incidents increased 4.9x over the collection period, a statistically significant increase that far outpaced the 1.7x growth in overall online discussion of scheming, and the 1.3x growth in general negative discussion about AI. This surge coincided with the release of a wave of more capable, more agentic AI models and frameworks from major developers.

While we did not detect catastrophic scheming incidents, the behaviours we observed nonetheless demonstrate concerning precursors to more serious scheming, such as a willingness to disregard direct instructions, circumvent safeguards, lie to users and single-mindedly pursue a goal in harmful ways.

Incidents included an AI model sustaining a months-long deception about its activities, an agent publishing a ‘hit-piece’ on a blogging site that criticised a developer after he rejected its proposed change to a software library, and a model that circumvented copyright restrictions by falsely claiming it was creating an accessibility transcript for people with hearing loss in order to deceive another AI model.

We also identify novel behaviours not yet described in scheming research, including potential evidence of an AI model attempting to deceive another AI model that was tasked with summarising its reasoning – a form of inter-model scheming that raises questions about the reliability of chain-of-thought monitoring as a safety technique.

The future of AI is deeply uncertain, but as AI systems become more capable, these behaviours could potentially evolve into more strategic, high-risk scheming with potentially catastrophic consequences.

What it means

The good news is that the catastrophic scheming scenarios of greatest concern to AI security researchers do not yet appear to be occurring. Most incidents remain contained in severity: AI agents today interact primarily with code, data, and software infrastructure, where the consequences of misaligned and/or covert actions, while disruptive, are often recoverable.

But the pattern of behaviour captured by the Observatory is troubling. Across hundreds of incidents, we see precisely the precursor behaviours that, as AI systems become more capable and are entrusted with more consequential tasks, could evolve into more strategic, high-stakes scheming that could lead to a loss of control emergency.

The severity of harms from scheming is a function not just of how often models exhibit these behaviours, but also of their capability level and the scope of what we entrust to them. The shift from AI agents interacting primarily with codebases to AI agents operating across critical national infrastructure, financial systems, and physical processes represents a pathway for an uplift in the level of risk. Continued rapid increase in the capability levels of AI systems represents another pathway for a potential uplift in the level of risk from scheming.

What should happen next

This research demonstrates that real-world scheming detection is both viable and urgently needed. In the same way that monitoring wastewater for emerging pathogens can identify threats before they develop into full-blown pandemics, systematic monitoring of AI behaviours in the wild can identify harmful patterns before they become more destructive.

No actor currently monitors real-world scheming incidents across all AI models. Existing incident databases, while valuable, are too slow to serve as an effective early-warning system.

Governments have an opportunity to develop world-leading situational awareness of scheming behaviours and agentic risks more broadly, by investing in real-world AI scheming detection as a sovereign capability. In the UK, this could build on the existing work of the UK AI Security Institute and our prototype Loss of Control Observatory.

Future efforts should address current limitations of this methodology: extending monitoring beyond X to platforms like GitHub and Reddit, developing methods to distinguish genuine increases in scheming propensity from increases in reporting and opportunity, considering other intelligence sources for monitoring, and improving techniques for differentiating scheming from mundane malfunctions.

How we did it

The Loss of Control Observatory uses a novel open-source intelligence (OSINT) methodology to collect and analyse transcripts of real interactions with AI systems, such as chatbot conversations and command-line logs shared publicly online. Over five months (October 2025 to March 2026), we analysed over 183,000 transcripts scraped from X (formerly Twitter), using a pipeline of automated screening, LLM-assisted classification, and manual review to identify credible evidence of scheming-related behaviours. We provide extensive detail on our methodology in the full report, including on its limitations and how we address them.

Most existing AI incident databases tend to depend heavily on news coverage and therefore skew towards incidents that are dramatic, easily understood, or involve measurable harm like death. By contrast, many scheming-related behaviours are too technical, too niche or too novel to attract media attention, but may nonetheless be important precursors to more serious risks. Our methodology is designed to catch precisely these kinds of incidents. 

This report was produced with the support of the AI Security Institute through the Institute’s Challenge Fund.

Download

Related Reports

Artificial Intelligence

Artificial Intelligence

The near-term impact of AI on disinformation

Tommy Shaffer Shane


It is rightly concerning to many around the world that AI-enabled disinformation could represent one of the greatest global risks we face, whether

May 16th 2024

Artificial Intelligence