Sandboxing AI Coding Agents
If you're running Claude Code, Codex, or Gemini CLI, do you know what they can actually do on your machine? Can the agent exfiltrate your SSH keys? Send your environment variables to an external server? Modify your shell config to run something malicious next time you open a terminal?
This uncertainty bothered me enough to dig in. All three CLIs have sandboxing capabilities that provide safety mechanisms, but you may or may not have them enabled. The good news is that enabling sandboxing is straightforward and rarely slows you down. But you need to understand what it protects against and where the gaps are.
This post covers the real risks, how each CLI implements sandboxing, and what to configure before you trust them with your codebase.
The Risks Are Real
If you're worried about using AI agents for development, your concerns are legitimate.
Secret exposure. Environment variables containing API keys, database passwords, and cloud credentials are accessible to the model. Sandboxing doesn't automatically protect them. They live in memory and are inherited by child processes unless explicitly blocked. Don't assume sandboxing handles this.
Prompt injection. Malicious instructions can easily be embedded in code comments, README files, or package documentation. When the agent ingests this content, it might follow the instructions. This is OWASP's #1 risk for LLM applications, and it cannot be fully solved at the model level. I've reproduced jailbreaks myself on Claude Opus 4.5 running in Claude Code.
Permission fatigue. Reddit threads are full of engineers admitting they click "approve" reflexively, or use --dangerously-skip-permissions because the friction is unbearable. One user put it bluntly: "Format my hard drive if you want... JUST DON'T MAKE ME CONFIRM ANOTHER BASH COMMAND!!!"
Accidental damage. Engineers may approve a command that accidentally trashes their code changes or even their entire development system. Recovery depends entirely on git discipline and backup practices.
What Does Sandboxing Mean in this Context?
Sandboxing runs a process in an isolated environment with restricted capabilities, with the goal of constraining what actions it can take.
The big three coding agents (Anthropic's Claude Code, OpenAI's Codex, and Google's Gemini CLI) implement sandboxing similarly, but with different defaults and nuances.
Sandboxing in these CLIs implements two main types of boundaries:
Filesystem isolation. What files can the agent read and write? Can it read your private keys in ~/.ssh? Can it modify files outside your project directory? Can it write to shell config files like .bashrc?
Network isolation. What can the agent make network requests to? Can it make API calls to services you haven't approved? Even an HTTP GET request can exfiltrate secrets via the URL path or query parameters.
Sandboxing operates at a lower, more fundamental level than the permission prompts you often see when using these CLIs. The permission prompts depend on the user making the right choice in the moment, while sandboxing doesn't.
You may or may not have sandboxing enabled right now. Don't assume. Check.
How Each Tool Implements Sandboxing
All three tools use OS-level isolation. Here's how they compare:
| Aspect | Claude Code | Codex | Gemini CLI |
|---|---|---|---|
| Default sandbox state | Disabled | Enabled | Disabled |
| Linux sandbox | Bubblewrap | Landlock + seccomp | Docker/Podman |
| macOS sandbox | Seatbelt | Seatbelt | Seatbelt |
| Windows support | No | Experimental | Yes (Docker) |
As you can see, it's important to review your sandbox state in each tool to make sure you've opted in to sandboxing.
In my testing, your shell environment variables are always available in commands issued by the agents, regardless of sandboxing settings. You may want to customize your shell environment accordingly if you want additional protections on this front.
All three CLIs on macOS rely on a tool called sandbox-exec that Apple has marked
deprecated. This is worth watching to see if it becomes problematic.
Quick Start
Claude Code (sandboxing docs) — Run /sandbox and choose a mode:
/sandbox
Codex (sandboxing docs) — Sandboxing is on by default. You can confirm by entering /status. The command line options include:
# Convenience alias for sandboxed automatic execution
codex --full-auto
# Or configure explicitly
codex --sandbox workspace-write --ask-for-approval on-request
Gemini CLI (sandboxing docs) — Enable sandbox mode explicitly:
gemini --sandbox
Not a Silver Bullet
Think of sandboxing as one layer of defense, not a complete solution. Here are some things to keep in mind, even when you have sandboxing enabled.
Domain allowlisting is coarse. If you allow github.com, the agent can push to any repo you have access to. If you allow npmjs.com, it could publish a package.
Trusted code can be compromised. If a dependency contains adversarial instructions in comments, the agent will see and potentially follow them. The sandbox limits response, but cannot prevent the agent from being influenced more generally.
Insecure code generation. Even without malicious intent, AI agents can generate code with vulnerabilities. Sandboxing doesn't help here. Code review does.
Escape hatches exist. Every tool provides --yolo or danger-full-access modes. Your actual security is only as strong as your team's discipline in avoiding these modes, or using them only in carefully constructed environments.
Security bugs happen. Security is hard to get right. All three tools have had vulnerabilities discovered and patched. For example: [1] [2] [3]
The good news: Codex and Gemini CLI are fully open source, and Claude Code's sandbox implementation is open source (though the rest of Claude Code is not). Security fixes happen in public. Keep your CLIs updated.
Recommendations
By risk profile:
- Regulated industry. Start with read-only mode. Enable writes only for approved projects. Document configuration for compliance.
- Heavy open-source use. Extra caution for prompt injection via package READMEs. Stricter network controls.
- High isolation needs. Run the agent inside a Docker container or VM. If something goes wrong, blow away the container.
Universal:
- Keep CLI versions updated
- Good git hygiene: feature branches, frequent commits, review diffs before pushing
- Test your sandbox before trusting it. Codex provides
codex sandbox macos <command>to verify behavior. - Avoid YOLO mode. Typically the productivity gain is marginal; the risk difference is not.
The Bottom Line
Sandboxing doesn't make AI coding assistants entirely safe, but it does give you guarantees where you'd otherwise have none.
Familiarize yourself with the sandbox settings and how to adjust them according to the risk profile associated with your active work.
Be sure to keep your CLIs updated, since security patches come out regularly.
References
Official Documentation:
Security Research & Incidents:
- OWASP Top 10 for LLM Applications
- CVE-2025-59532 (Codex path validation)
- CVE-2025-61260 (Codex config injection)
- Gemini CLI vulnerability Example
Open Source: