Containing AI Agents: The Endo Familiar Demo - Decentralized Cooperation Foundation (DCF)

8 min read Original article ↗

When you authorize an AI agent today, what exactly are you authorizing?

The moment you authorize an AI agent, you’re making a bet. A bet that the model won’t be fooled. That the pipeline won’t be compromised. That a malicious instruction buried in a document the agent reads won’t redirect it toward something you never asked for. Prompt injection is already a live attack vector. And most agent frameworks are handing the attacker everything they need to do real damage

That’s a bad bet. And it’s being made at scale, across critical infrastructure, right now.

The Credential Bag Problem

The security model behind tools like OpenClaw boils down to putting all of your permissions, all of your credentials, all of your authority into a bag and then handing that to an AI agent to do with it as it will.

That framing should stop you cold.

If an AI agent has access to your file system, it has access to your SSH keys. If it can read your calendar, it can probably read your email. If it touches your API credentials, it touches all of them. These systems aren’t architected around the concept of partial trust. They’re designed around convenience.

This isn’t a theoretical vulnerability waiting to happen. AI agents already go off the rails. They hallucinate instructions. They can be hijacked through prompt injection. The alignment isn’t perfect, and it won’t be for a long time, if ever. The question isn’t whether an agent will someday do something unintended. The question is how much damage it can cause when it does.

Right now, the answer is: potentially everything it can with what you gave it.

The Time is Now

The deployment curve for AI agents is steep and accelerating. Developers are integrating agents into codebases, inboxes, calendars, cloud environments, and production systems. Most of this integration is happening before anyone has seriously addressed the security foundation underneath it.

Social media made the same mistake a decade ago. Applications ran with the full privileges of their users. If you could delete your files, a compromised app could too. If you could message your contacts, so could a malicious extension. The architecture made certain disasters inevitable, and sure enough, they happened. The industry built on a broken foundation, and spent years trying to patch the consequences.

AI agents are a structurally identical problem. The time to get the foundation right is before the breaches, not after.

A Different Model: Authority Follows References

The Endo framework, built on HardenedJS and the object-capability security model, offers a fundamentally different approach. The core principle is this: a reference is authority. If your code holds a reference to an object, it can use it. If it doesn’t have the reference, it can’t forge one. There’s no ambient pool of permissions to accidentally expose. Authority flows exactly where you pass it, and nowhere else.

This sounds abstract until you watch it in action.

In our demo, an AI agent is spawned with a single capability: the ability to read a primer document that tells it how to operate. That’s it. No file system access. No network access. No credentials. The agent exists in the system, functional and responsive, but can only reach what it’s explicitly been handed.

When the demo calls for the agent to work with files, a specific file system capability is created. Not “access to the file system,” but a mount of a specific directory that cannot traverse above its root, cannot follow symbolic links out of its tree, and cannot escape its boundaries by construction. That mount gets handed to the agent. The agent now has exactly what it needs to do the job, and nothing more.

The demo goes further. The agent is asked to write a program that produces a read-only view of a directory. The generated code gets reviewed. It runs in a sandbox with no ambient capabilities. The output is a new, narrower capability derived from the original, and that narrowed capability is handed back to the agent. At each step, the scope of what the agent can touch shrinks to match the task.

This is the Principle of Least Authority in practice. Not as a policy document, but as a structural guarantee.

The Pet Name System: Making Capability Human-Readable

One of the subtler problems with capability-based security is that cryptographic identifiers are unreadable to humans. A suitably large random number is how you address a capability securely over a network. It’s also completely opaque to anyone trying to understand what they’re authorizing.

Endo solves this with a pet name system. Capabilities in the Familiar have user-assigned names. “Scratch,” “loldir,” “lolrodir.” Your names for things in your inventory. You recognize them at a glance, see what you’ve granted to whom, and revoke access by removing the reference.

The demo extends this across a network. A colleague connects over a WebSocket relay using an invite-accept model and shares a directory from their machine. It gets adopted into the local inventory under a chosen name, then passed to the agent. The agent receives a file system mount capability, reads the contents, and summarizes them. It has no idea whether the directory is local or remote, and it doesn’t need to. It holds the capability; the transport is irrelevant.

What’s being demonstrated here is something more significant than a clever UI. It’s a social, capability-based computing model where authority is granted explicitly, named meaningfully, and passed deliberately between agents and people on a decentralized network.

AI-Written Code as a Security Surface

There’s a moment in the demo that deserves particular attention. The agent is asked to write a program that creates a read-only file system view. The agent produces code. It gets reviewed. It runs inside a compartment with no capabilities beyond what’s needed for that specific task.

The compartment gives the user confidence that, regardless of whether the AI-written program does what the user asked, it can only mishandle the capabilities the user granted.  It runs, but in an environment that clamps the worst-case-scenario to abuse of specific capabilities, not spooky action at a distance. The compartment is the safety mechanism, not the trust in the model.

This is the right mental model for AI-written code at scale. Models will write bugs. They will occasionally produce subtly incorrect logic. In some threat scenarios, they will be the vector for an attack. None of that is a dealbreaker if the execution environment structurally limits the blast radius. The goal isn’t to produce perfect AI output. It’s to make imperfect AI output safe to run.

What This Means for Developers and Builders

If you’re building with AI agents, the questions worth asking are uncomfortable ones.

If an agent’s inference engine is compromised, what can it reach? If the model makes a catastrophic mistake and tries to delete files or exfiltrate data, what stops it? If a supply chain attack compromises a dependency the agent harness relies on, what’s the blast radius?

Most current frameworks don’t have good answers. They’re built for capability and convenience, not for containment.

The Endo Familiar is a prototype of what the alternative looks like. Although it is just a proof of concept, the core architecture is sound and has been production-tested through Agoric’s smart contract platform and MetaMask’s extensible wallet system. These are not research environments. They’re handling real economic value, running untrusted third-party code, and doing it safely through the same object-capability model.

The patterns exist. The framework’s in place. The question is whether the people building AI-powered systems will adopt secure foundations now, while adoption curves are still forming, or wait until a high-profile breach forces the conversation.

The architecture shown in our demo isn’t speculative. It runs today. The Familiar is a working prototype of a user agent that treats AI agents with the same suspicion a web browser applies to arbitrary JavaScript: useful, often trustworthy, but never unconditionally so. Authority is granted in controlled amounts. Capabilities are named, scoped, and revocable. Damage from failure is bounded by construction.If you’re a builder using AI agents, this is a good time to investigate Endo’s Familiar and begin building your automation on an extensible security framework.  The integration points exist. The architecture is proven. Don’t wait for a breach to force the conversation.

Ready to build agents that fail safely? Explore Endo on GitHub at github.com/endojs/endo, dig into the Familiar, and start architecting your AI integrations on a foundation that contains damage by construction, not by hope.

Check out our blog covering DCF’s Foresight Institute grant for AI safety

More information on Endo can be found at Endojs.org