The Deep Noodle Blog | Sandboxing AI Coding Agents

Sandboxing AI Coding Agents

If you're running Claude Code, Codex, or Gemini CLI, do you know what they can actually do on your machine? Can the agent exfiltrate your SSH keys? Send your environment variables to an external server? Modify your shell config to run something malicious next time you open a terminal?

This uncertainty bothered me enough to dig in. All three CLIs have sandboxing capabilities that provide safety mechanisms, but you may or may not have them enabled. The good news is that enabling sandboxing is straightforward and rarely slows you down. But you need to understand what it protects against and where the gaps are.

This post covers the real risks, how each CLI implements sandboxing, and what to configure before you trust them with your codebase.

The Risks Are Real

If you're worried about using AI agents for development, your concerns are legitimate.

Secret exposure. Environment variables containing API keys, database passwords, and cloud credentials are accessible to the model. Sandboxing doesn't automatically protect them. They live in memory and are inherited by child processes unless explicitly blocked. Don't assume sandboxing handles this.

Prompt injection. Malicious instructions can easily be embedded in code comments, README files, or package documentation. When the agent ingests this content, it might follow the instructions. This is OWASP's #1 risk for LLM applications, and it cannot be fully solved at the model level. I've reproduced jailbreaks myself on Claude Opus 4.5 running in Claude Code.

Permission fatigue. Reddit threads are full of engineers admitting they click "approve" reflexively, or use --dangerously-skip-permissions because the friction is unbearable. One user put it bluntly: "Format my hard drive if you want... JUST DON'T MAKE ME CONFIRM ANOTHER BASH COMMAND!!!"

Accidental damage. Engineers may approve a command that accidentally trashes their code changes or even their entire development system. Recovery depends entirely on git discipline and backup practices.

What Does Sandboxing Mean in this Context?

Sandboxing runs a process in an isolated environment with restricted capabilities, with the goal of constraining what actions it can take.

The big three coding agents (Anthropic's Claude Code, OpenAI's Codex, and Google's Gemini CLI) implement sandboxing similarly, but with different defaults and nuances.

Sandboxing in these CLIs implements two main types of boundaries:

Filesystem isolation. What files can the agent read and write? Can it read your private keys in ~/.ssh? Can it modify files outside your project directory? Can it write to shell config files like .bashrc?

Network isolation. What can the agent make network requests to? Can it make API calls to services you haven't approved? Even an HTTP GET request can exfiltrate secrets via the URL path or query parameters.

Sandboxing operates at a lower, more fundamental level than the permission prompts you often see when using these CLIs. The permission prompts depend on the user making the right choice in the moment, while sandboxing doesn't.

You may or may not have sandboxing enabled right now. Don't assume. Check.

How Each Tool Implements Sandboxing

All three tools use OS-level isolation. Here's how they compare:

Aspect	Claude Code	Codex	Gemini CLI
Default sandbox state	Disabled	Enabled	Disabled
Linux sandbox	Bubblewrap	Landlock + seccomp	Docker/Podman
macOS sandbox	Seatbelt	Seatbelt	Seatbelt
Windows support	No	Experimental	Yes (Docker)

As you can see, it's important to review your sandbox state in each tool to make sure you've opted in to sandboxing.

In my testing, your shell environment variables are always available in commands issued by the agents, regardless of sandboxing settings. You may want to customize your shell environment accordingly if you want additional protections on this front.

All three CLIs on macOS rely on a tool called sandbox-exec that Apple has marked deprecated. This is worth watching to see if it becomes problematic.

Quick Start

Claude Code (sandboxing docs) — Run /sandbox and choose a mode:

/sandbox

Codex (sandboxing docs) — Sandboxing is on by default. You can confirm by entering /status. The command line options include:

# Convenience alias for sandboxed automatic execution
codex --full-auto

# Or configure explicitly
codex --sandbox workspace-write --ask-for-approval on-request

Gemini CLI (sandboxing docs) — Enable sandbox mode explicitly:

gemini --sandbox

Not a Silver Bullet

Think of sandboxing as one layer of defense, not a complete solution. Here are some things to keep in mind, even when you have sandboxing enabled.

Domain allowlisting is coarse. If you allow github.com, the agent can push to any repo you have access to. If you allow npmjs.com, it could publish a package.

Trusted code can be compromised. If a dependency contains adversarial instructions in comments, the agent will see and potentially follow them. The sandbox limits response, but cannot prevent the agent from being influenced more generally.

Insecure code generation. Even without malicious intent, AI agents can generate code with vulnerabilities. Sandboxing doesn't help here. Code review does.

Escape hatches exist. Every tool provides --yolo or danger-full-access modes. Your actual security is only as strong as your team's discipline in avoiding these modes, or using them only in carefully constructed environments.

Security bugs happen. Security is hard to get right. All three tools have had vulnerabilities discovered and patched. For example: [1] [2] [3]

The good news: Codex and Gemini CLI are fully open source, and Claude Code's sandbox implementation is open source (though the rest of Claude Code is not). Security fixes happen in public. Keep your CLIs updated.

Recommendations

By risk profile:

Regulated industry. Start with read-only mode. Enable writes only for approved projects. Document configuration for compliance.
Heavy open-source use. Extra caution for prompt injection via package READMEs. Stricter network controls.
High isolation needs. Run the agent inside a Docker container or VM. If something goes wrong, blow away the container.

Universal:

Keep CLI versions updated
Good git hygiene: feature branches, frequent commits, review diffs before pushing
Test your sandbox before trusting it. Codex provides codex sandbox macos <command> to verify behavior.
Avoid YOLO mode. Typically the productivity gain is marginal; the risk difference is not.

The Bottom Line

Sandboxing doesn't make AI coding assistants entirely safe, but it does give you guarantees where you'd otherwise have none.

Familiarize yourself with the sandbox settings and how to adjust them according to the risk profile associated with your active work.

Be sure to keep your CLIs updated, since security patches come out regularly.

References

Official Documentation:

Security Research & Incidents:

Open Source: