A registry of AI agent failures, exploits, and defenses

The viral Moltbot AI assistant and its associated Moltbook agent social network raised security concerns due to agents being granted broad access to files, credentials, and external services while interacting with untrusted content and other agents, creating new paths for data exposure and manipulation.

Agents with persistent memory, system access, and external communication capabilities increased the risk of private data leakage, delayed prompt-based attacks, and unintended coordination between agents, potentially amplifying security failures at scale.

The incident primarily resulted in warnings from researchers and security teams, with recommended mitigations including limiting permissions, reducing autonomous access to sensitive systems, and strengthening safeguards around agent memory and external communication.

Palo Alto Networks warned that Moltbot highlights a new class of risk where autonomous agents combine broad system access, exposure to untrusted content, and external communication, creating conditions where agents can unintentionally leak data or execute harmful actions without direct exploitation.

Agents operating with persistent memory and high privileges increase the risk of delayed attacks, data exfiltration, and large-scale security failures, especially as agents interact with other agents and external systems beyond traditional security visibility.

The report emphasizes governance and architectural controls rather than a single fix, recommending tighter permission boundaries, stronger monitoring, and security models designed specifically for autonomous agents instead of traditional application defenses.

The viral AI assistant Moltbot (formerly Clawdbot) gained rapid adoption despite being granted broad access to users’ accounts, files, and services, raising concerns that highly autonomous agents with persistent memory and system access could behave unpredictably or expose sensitive data without sufficient safeguards.

Users allowing the agent to manage personal or business workflows risked privacy exposure, unintended actions, and data leakage, as the assistant could automate tasks across connected systems with limited oversight or security controls.

The incident primarily resulted in increased awareness rather than a single fix; researchers and developers emphasized limiting permissions, maintaining human oversight, and avoiding granting full account access to autonomous agents without stronger guardrails.

Security researchers found that Moltbot (formerly Clawdbot), an agentic personal assistant with broad system and account access, could expose sensitive data due to misconfigurations, insecure defaults, and supply-chain risks in its skills ecosystem, including publicly exposed instances and unmoderated downloadable skills.

Exposed or compromised instances could allow attackers to access private messages, credentials, API keys, and connected services, effectively turning the agent into a backdoor capable of ongoing data exfiltration or command execution.

Some configuration and authentication issues were addressed after disclosure, but researchers emphasized stronger access controls, least-privilege permissions, and secure deployment practices to reduce exposure when running agentic systems.

While using Google Antigravity in “Turbo” mode (automatic command execution), the agent wiped the entire content of the user’s D-drive while attempting to clear the project cache

User lost full D-drive contents. Other users report similar issues

User advised others to exercise caution running Antigravity in Turbo mode as this enables the agent to execute commands without user input or approval

A bug in Asana’s MCP server allowed users from one account to access “projects, teams, tasks, and other Asana objects” from other domains

Cross-tenant data exposure risk for all MCP users, though no confirmed exploit; customers were notified and access suspended

The MCP server was taken offline, the code issue was fixed, affected customers were notified, and logs/metadata were made available for review.

Replit’s AI coding assistant ignored the instruction not to change any code 11 times, fabricated test data, and deleted a live production database.

Trust damaged; user code at risk; public apology by CEO

Product enhancements with backups in place and one-click restore launched

Researchers found that Claude Cowork, Anthropic’s general-purpose AI agent, can be tricked via indirect prompt injection into uploading user files to an attacker’s Anthropic account by abusing a known isolation flaw and the agent’s file/network access.

An attacker can exfiltrate sensitive user files (including documents with financial details or PII) without explicit user approval once Cowork has been granted folder access, exposing organizations to data theft and confidentiality breaches.

The vulnerability was publicly demonstrated; mitigations focus on restricting file access and strengthening prompt sanitization, though no formal fix has been confirmed — prompting warnings that users should avoid granting access to sensitive files and security teams should harden agent permissions.

Microsoft Copilot Studio no-code AI agents were shown to be vulnerable to prompt injection, allowing attackers to override instructions and extract sensitive corporate data or trigger unintended actions.

This exposed organizations to customer data leakage, unauthorized workflow changes, and financial risk, especially since no-code agents can be widely deployed without strong security oversight.

Researchers recommended input filtering, stricter access controls, least-privilege permissions, and sandboxing to reduce agent abuse and limit data exposure.

Attackers exploited ServiceNow Now Assist agent-to-agent collaboration + default config to trick a low-privileged agent into delegating malicious commands to a high-privilege agent, resulting in data exfiltration.

Sensitive corporate data leaked or modified; unauthorized actions executed behind the scenes

ServiceNow updated documentation and recommended mitigations: disable autonomous override mode for privileged agents, apply supervised execution mode, and segment responsibilities

Google Antigravity data-exfiltration via prompt injection. A “poisoned” web page tricked Antigravity’s agent into harvesting credentials and code from a user’s local workspace, then exfiltrating it to a public logging site.

Sensitive credentials and internal code exposed; default protections (e.g. .gitignore, file-access restrictions) bypassed.

The vulnerability has been publicly disclosed by researchers. PromptArmor and others highlight the need for sandboxing, network-egress filtering, and stricter default configurations.

A “zero-click” exploit called Shadow Escape targeted major AI-agent platforms via their MCP connections. Malicious actors abused agent integrations to access organizational systems.

Agents inside trusted environments were silently hijacked, bypassing controls. Because it exploited default MCP configs and permissions, the potential blast radius covered massive volumes of data.

Initial remediation advice included auditing AI agent integrations, enforcing least privilege, and treating uploaded documents as potential attack vectors.

Researchers demonstrated how the web-search tool in Notion’s AI agents could be abused to exfiltrate private data via a malicious prompt.

Confidential user data from internal Notion workspaces could be exposed to attackers

Notion declared the vulnerability and announced a review of tool permissions and integrations.

Supabase MCP data-exposure through prompt injection. The agent used the service_role key and interpreted user content as commands, allowing attackers to trigger arbitrary SQL queries and expose private tables.

Complete SQL database exposure. All tables became readable. Sensitive tokens, user data, internal tables at risk.

Public disclosure by researchers. Calls for least-privilege tokens instead of service_role, read-only MCP configuration, and gated tool access through proxy/gateway policy enforcement.

A prompt-injection flaw in GitHub’s MCP server lets attackers use AI agents to access private repos and exfiltrate code.

Private code, issues, and sensitive project data could be exposed via public pull requests.

Organizations were advised to limit agent permissions, disable the integration, and apply stricter review of tokens.

During the rapid rebrand from Clawdbot to Moltbot, attackers exploited confusion around account changes and project identity, hijacking social accounts and launching fake crypto tokens while impersonating the project and spreading malicious copies.

The incident led to financial losses from scam tokens, reputational damage to the project, and exposure of users to malicious software and insecure agent deployments during a period of rapid adoption and unclear trust signals.

The developer publicly denied involvement, warned users of scams, and completed the rebrand while encouraging users to verify official sources and avoid unofficial tools or tokens associated with the project.

A Chinese state-sponsored group abused Anthropic Claude Code and MCP tools to automate ~80–90% of a multi-stage agentic cyber espionage operation across ~30 global organizations.

Successful intrusions and data exfiltration at a subset of tech, finance, chemical, and government targets; first widely reported large-scale agentic AI-orchestrated cyberattack.

Anthropic detected the activity, banned attacker accounts, notified affected organizations, shared IOCs with partners, and tightened safeguards around Claude Code and MCP use.

Malice in Agentland study found attackers could poison the data-collection or fine-tuning pipeline of AI agents . Even with as low as 2% of traces poisoned, embedding backdoors that trigger unsafe or malicious behavior when a specific prompt or condition appears

Once triggered, agents leak confidential data or perform unsafe actions with a high success rate (~80 %). Traditional guardrails and two standard defensive layers failed to detect or block the malicious behavior.

The study raises alarm across the community; calls for rigorous vetting of data pipelines, supply-chain auditing, and end-to-end security review for agentic AI development

Help us keep this registry complete and up to date

If you’re aware of a publicly documented agent-related breach we haven’t captured, share it below. We’ll review and add it to the registry.

We’ve received your incident report and will review it shortly. If we need additional details, we’ll reach out using the email provided.

Something went wrong while submitting the form. Please check your information and try again.

Tell us how an AI agent went rogue—anonymously, so it can’t come after you later. We will review and add it to the registry.

We’ve received your incident report and will review it shortly.

Something went wrong while submitting the form. Please check your information and try again.

Next Steps

If you want to run powerful agents safely, you need the right guardrails in place. To learn more about agentic security and how Oso can help, book a meeting with the Oso team.