GitHub - SecBear/nix-sandbox-mcp: Sandboxed code execution for LLMs, powered by Nix

nix-sandbox-mcp

Sandboxed code execution for LLMs, powered by Nix.

LLMs need to run code. Most solutions reach for Docker — heavyweight, non-reproducible, and yet another daemon to manage. nix-sandbox-mcp uses Nix instead: environments are declarative flake expressions, sandboxing is jail.nix (bubblewrap + Linux namespaces, no root required), and a planned microvm.nix backend adds full VM isolation when you need it. Everything runs locally — no cloud, no containers, no image pulls.

Quick Start

Requirements: Linux with Nix (flakes enabled). The sandbox uses bubblewrap + Linux namespaces for isolation — macOS and Windows are not supported. WSL2 may work if your kernel has user namespaces enabled.

Add to your MCP client config:

{
  "mcpServers": {
    "nix-sandbox": {
      "command": "nix",
      "args": ["run", "github:secbear/nix-sandbox-mcp", "--", "--stdio"],
      "env": {
        "PROJECT_DIR": "/home/user/myproject"
      }
    }
  }
}

That's it. The LLM gets three sandboxed environments (shell, python, node) with your project mounted read-only at /project. Drop PROJECT_DIR if you don't need project access.

Custom Environments

The bundled presets are a starting point. Define your own with a Nix flake:

# my-envs/flake.nix
{
  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
    nix-sandbox-mcp.url = "github:secbear/nix-sandbox-mcp";
  };

  outputs = { nixpkgs, nix-sandbox-mcp, ... }:
    let pkgs = nixpkgs.legacyPackages.x86_64-linux;
    in {
      packages.x86_64-linux = {
        data-science = nix-sandbox-mcp.lib.mkSandbox {
          inherit pkgs;
          name = "data-science";
          interpreter_type = "python";
          packages = [
            (pkgs.python3.withPackages (ps: [ ps.numpy ps.pandas ps.requests ]))
          ];
        };

        nix-tools = nix-sandbox-mcp.lib.mkSandbox {
          inherit pkgs;
          name = "nix-tools";
          interpreter_type = "bash";
          packages = [ pkgs.ripgrep pkgs.fd pkgs.jq pkgs.yq-go pkgs.tree ];
        };
      };
    };
}

Point NIX_SANDBOX_ENVS at your flake refs. They're built at server startup and merged with the bundled presets:

{
  "mcpServers": {
    "nix-sandbox": {
      "command": "nix",
      "args": ["run", "github:secbear/nix-sandbox-mcp", "--", "--stdio"],
      "env": {
        "PROJECT_DIR": "/home/user/myproject",
        "NIX_SANDBOX_ENVS": "github:myorg/envs#data-science,github:myorg/envs#nix-tools"
      }
    }
  }
}

Now the LLM can use custom tools against your live codebase, fully sandboxed:

# env: "nix-tools"
rg "TODO" /project/src --type rust -c
# /project/src/main.rs:3
# /project/src/config.rs:1

# env: "data-science"
import pandas as pd
df = pd.read_csv("/project/data/results.csv")
print(df.describe())

interpreter_type maps the sandbox to an agent REPL — "python", "bash", or "node". Pass a session ID to persist variables and imports across calls.

If you prefer pre-building over startup builds, nix build your sandbox into ~/.config/nix-sandbox-mcp/sandboxes/ and skip NIX_SANDBOX_ENVS entirely. The daemon scans that directory at startup.

Configuration

All runtime settings are env vars in the MCP client JSON:

Variable	Purpose	Default
`PROJECT_DIR`	Project directory to mount read-only	(none)
`PROJECT_MOUNT`	Mount point inside sandbox	`/project`
`NIX_SANDBOX_ENVS`	Comma-separated flake refs to build at startup	(none)
`NIX_SANDBOX_DIR`	Pre-built sandbox directory	`~/.config/nix-sandbox-mcp/sandboxes`
`SESSION_IDLE_TIMEOUT`	Idle timeout in seconds	`300`
`SESSION_MAX_LIFETIME`	Max session lifetime in seconds	`3600`

Build-time settings (environment definitions, default timeouts) live in config.example.toml for customizing the bundled presets or baking additional environments into the server at build time.

Security

jail.nix (namespace isolation) — the current backend. Uses bubblewrap to create unprivileged sandboxes with separate user, PID, network, and mount namespaces. No network access by default. Project files are mounted read-only. This protects against accidental damage and opportunistic malicious code. It does not protect against kernel exploits — the sandbox shares the host kernel.

microvm.nix (VM isolation) — planned. Separate Linux kernel per sandbox via KVM, virtiofs for store access, vsock for communication. Full isolation including kernel attack surface. This is the right choice for running untrusted code from the internet.

Architecture

MCP Client
  │ JSON-RPC over stdio
  ▼
Shell wrapper
  │ builds NIX_SANDBOX_ENVS, execs daemon
  ▼
Rust daemon
  ├─ ephemeral ──▶ bubblewrap jail ──▶ interpreter
  └─ session   ──▶ bubblewrap jail ──▶ sandbox_agent.py ──▶ persistent REPL

The daemon handles MCP protocol and process dispatch. Nix handles everything else — environment resolution, package composition, sandbox wrapper generation. Environments come from three sources (bundled presets, NIX_SANDBOX_ENVS startup builds, pre-built artifacts in $NIX_SANDBOX_DIR) and all produce the same artifact format. The daemon doesn't know which source an environment came from.

See CONTRIBUTING.md for build instructions, repo layout, and internals.

Design: Context Budget

MCP servers pay a token tax: every tool schema is injected into the LLM's context window at connection time. A server exposing 60 tools can burn ~47k tokens before the user says anything. This matters because context is finite and expensive — tokens spent on tool definitions are tokens unavailable for reasoning.

Common approaches and their costs:

Approach	Init cost	Trade-off
Static loading (all tools upfront)	~150 tokens × N tools	Context bloat scales linearly with tool count
Dynamic discovery (list → schema → call)	~400 tokens fixed	Extra round-trips per invocation; LLM must learn discovery protocol
Skill/guide documents (SKILL.md)	~800 tokens on activation	Rich guidance but heavy; separate document to maintain

Our approach: one parameterized tool.

nix-sandbox-mcp exposes a single run tool that takes an env parameter. Adding environments (python, node, shell, custom flakes) doesn't add tools — it adds a value to a parameter. The fixed context cost is ~420 tokens regardless of how many environments are configured:

Component	Tokens	What it contains
Tool schema	~75	Name, params (`code`, `env`, `session`), selection guidance
Server instructions	~160	Environment list, session workflow, debugging hints
Per-parameter descriptions	~80	Field-level usage hints via JSON Schema
Total	~420	Constant — does not grow with environment count

Compare: if each environment were a separate tool (3 bundled + 5 custom = 8 tools), that would cost ~1,200+ tokens and grow with every environment added.

Where guidance lives:

Rather than a separate guidance document, tool-selection and workflow hints are embedded directly in the MCP protocol fields that LLMs already read:

Tool description — when to use the sandbox vs built-in shell (isolation, reproducibility, resource limits vs file edits, git, host commands)
Server instructions — available environments, session lifecycle (ephemeral by default, sessions for multi-step work), debugging hints
Parameter descriptions — per-field usage via JSON Schema description

This keeps all guidance in-band and co-located with the tool definition. No extra documents to load, no discovery protocol to learn, no activation step.

Roadmap

Phase	Status	What
1	Done	jail.nix backend, bundled presets, MCP protocol
2a	Done	Project mounting, custom flake refs in config
2b	Done	Session persistence (Python, Bash, Node REPLs)
2c	Done	Decoupled sandboxes (`mkSandbox`, directory scanning)
2d	Done	MCP-conventional config (env vars, `NIX_SANDBOX_ENVS`)
3a	Planned	microvm.nix backend for hardware-level isolation
3b	Planned	Dead interpreter recovery (restart bash/node on crash)

License

MIT