How the ‘Lethal Trifecta’ sets the conditions for stealing data on command

COMMENTARY: A new class of security failure has already taken hold as AI agents push deeper into production. Security blogger Simon Willison's post earlier this year on the “lethal trifecta” crystallized this growing concern.

Willison said an AI agent becomes dangerous when three conditions coexist: untrusted human input, access to sensitive data, and a path to exfiltrate information.

[SC Media Perspectives columns are written by a trusted community of SC Media cybersecurity subject matter experts. Read more Perspectives here.]

If an attacker can trigger all three, they can manipulate agents into stealing private data on command.

Many teams still think of agents as upgraded chatbots that follow explicit user instructions, not as autonomous and non-deterministic entities that can act on their own.

But as AI agents move into production, this trifecta becomes a structural risk that traditional controls were never designed to handle. Here’s where authorization can act as a defining control layer, capable of breaking the trifecta before it becomes an attack.

Moving forward, keep these three points in mind:

Untrusted input exists everywhere: LLMs cannot reliably distinguish instructions from content. To the model, it all mixes together in the same soup. That means any material an agent encounters can influence its decisions, whether it comes from a user’s prompt or from a random webpage it scrapes while carrying out a task. As a result, an agent acting for a trusted user can still get steered by adversary-poisoned context. We’ve seen this already. In Google’s Antigravity incident, a poisoned webpage injected instructions into the agent’s context. Unable to separate malicious input from legitimate task data, the agent harvested credentials and source code and exfiltrated them to a public logging site. Agents with memory amplify the risk. An attacker can plant malicious instructions that lie dormant until a legitimate user later triggers the agent. It’s easy to trust the user; but it’s tougher to trust the full context their agent receives – and that’s where attacks hide.

Think of sensitive data access as a given: Most useful agents must interact with sensitive systems such as HR, CRM, internal databases, and source code. Removing that access neuters their value. Consider a customer support agent that requires users to re-enter their account and order data because the agent has no access to the backoffice system. The Replit incident around shows the dangers. The company’s coding assistant ignored instructions, fabricated test data, altered files, and ultimately deleted a live production database. The issue wasn’t malice; it was broad, coarse-grained permissions combined with non-deterministic behavior. Useful agents require sensitive data access, so we must control the other two trifecta elements with far greater discipline.

Exfiltration vectors are easy to miss: Agents often have outbound channels by design, for example email, Slack, database updates, or generating content for external systems. But exfiltration can also occur unintentionally. An embedded image, for example, can leak data through URL parameters. Shadow Escape highlighted this risk. Attackers exploited overly permissive Model Context Protocol (MCP) configurations, turning agents into high-speed exfiltration bots that silently extracted sensitive records. No alerts fired because the agents were behaving in ways that appeared operationally normal, not malicious. With agents, the same tool that enables useful work can just as easily become an unintended exfiltration path.

The lethal trifecta represents a structural risk that demands deterministic, fine-grained controls enforced outside the model itself. LLMs are probabilistic and manipulable by design. This makes them powerful, but also fundamentally unsuitable as their own security guardians.

Any attempt to use LLMs to reliably detect or block manipulation inherits the same non-determinism and failure modes as the system it’s meant to protect. And that's not good enough for security, where Willison notes that 99% is a failing grade. Authorization offers that deterministic external control layer through:

Task-bound access: Agents should receive only the narrowly scoped permissions required for the task they are executing, for example, read access to a single record rather than an entire table.

Total visibility: Every agent action must be observable. When an agent behaves unexpectedly, teams need an immediate signal.

Instant containment: When behavior goes sideways, teams must quarantine an agent in real time.

Centralized control: Permissions must live in one auditable, testable place. Fragmented configuration across agents and tools guarantees drift and inconsistent enforcement.

The autonomy, speed, and non-deterministic behavior of agents make them both powerful and dangerous in ways the industry has only begun to grasp. The lethal trifecta reminds us that attackers don’t need to compromise our systems; they only need to convince the agent.

Managing the lethal trifecta requires engineering and security working in lockstep. Authorization gives them the shared control point to do it.

Gabe Jackson, founding engineer, Oso

SC Media Perspectives columns are written by a trusted community of SC Media cybersecurity subject matter experts. Each contribution has a goal of bringing a unique voice to important cybersecurity topics. Content strives to be of the highest quality, objective and non-commercial.