LLM and MCP proxy for agent workspaces. The daemon proxies LLM API calls and remote MCP tool calls through a unix socket. The runtime drives the agent loop inside the container. Credentials never cross the socket boundary.
How It Works
Two components:
-
Runtime — a binary inside the agent container that runs the agent loop. Waits for human messages, sends turns to the daemon, executes tools locally (bash, file I/O), and sends results back.
-
Daemon — a sidecar process in the daemon container, one per agent. Listens on a unix socket, reads API credentials from mounted secret files, manages conversation history, forwards requests to the LLM provider, and executes MCP tool calls on behalf of the agent.
Each agent gets its own daemon instance. LLM config (provider, model, API key) comes from k8s Secret volume mounts at /run/secrets/llm/. MCP servers are declared as subdirectories of /run/secrets/mcp/.
The runtime sends new messages via turn requests over a length-prefixed binary framing protocol. The daemon reads the API key from a secret file, forwards to the LLM provider, and streams the response back. When the LLM requests MCP tools (GitHub, web search, etc.), the daemon executes them directly. Every exchange is logged to NDJSON files.
Why Tightbeam
AI agents running in containers need to call LLM APIs, but giving them API keys means:
- Credential exposure — a compromised agent leaks your API key
- No audit trail — the agent calls whatever it wants with your credentials
- No conversation control — the agent manages its own context window
Tightbeam solves this by proxying LLM calls through the daemon. The container sends messages, the daemon attaches credentials and manages history. The runtime is stateless — it doesn't know the API key, the model, or even the provider.
When the container has no network egress, tightbeam is the agent's sole communication gateway to the outside world.
Use Airlock for CLI credential isolation. Use Tightbeam for LLM API isolation.
Installation
Container Setup
Download the runtime from releases and add it to your Dockerfile:
COPY tightbeam /usr/local/bin/tightbeamThe runtime drives the agent loop. It connects to the daemon socket, loads the system prompt from /etc/agent/ (all .md files, sorted and concatenated), and enters the agent loop.
tightbeam --tools bash,read_file,write_file,list_directory
The runtime connects to the daemon socket at /run/tightbeam/tightbeam.sock.
Flags:
| Flag | Required | Default | Description |
|---|---|---|---|
--tools |
yes | — | Comma-separated tool list |
--max-iterations |
no | 100 | Max tool call rounds per human message |
--max-output-chars |
no | 30000 | Truncate tool output beyond this |
Available tools: bash, read_file, write_file, list_directory.
The system prompt is assembled automatically from all .md files in /etc/agent/ inside the container. Files are sorted by path and concatenated, supporting both single-file and multi-file layouts.
LLM Config (k8s Secret Mount)
LLM config is read from files mounted at /run/secrets/llm/:
/run/secrets/llm/provider -> "anthropic"
/run/secrets/llm/model -> "claude-sonnet-4-20250514"
/run/secrets/llm/api-key -> "sk-ant-..."
/run/secrets/llm/max-tokens -> "8192" # optional, defaults to 8192
The daemon reads these files at startup. Missing provider, model, or api-key is a hard error. Values are trimmed of whitespace.
In k8s, these are populated by a Secret volume mount. The orchestrator creates the Secret and mounts it into the daemon container.
MCP Config (Mounted Directory)
MCP servers are declared as subdirectories of /run/secrets/mcp/:
/run/secrets/mcp/
github/
url # "https://mcp.github.com/sse" (required)
auth_token # "ghp_xxxx" (optional — absent means no auth)
tools # one tool name per line (optional — absent means all)
filesystem/
url # "http://localhost:9100/sse"
Each subdirectory is one MCP server. The subdirectory name is the server's logical name. If the directory is empty or doesn't exist, the daemon starts with zero MCP servers.
Docker Run
Mount the agent's socket into the container:
docker run \
-v /run/sockets/my-agent.sock:/run/docker-tightbeam.sock \
your-imageConversation logs are written to <logs-dir>/<name>/. Mount them read-only if the agent needs prior context on restart:
docker run \
-v /run/sockets/my-agent.sock:/run/docker-tightbeam.sock \
-v /var/log/tightbeam/my-agent:/var/log/tightbeam:ro \
your-imageUsage
Daemon
tightbeam-daemon start tightbeam-daemon logs tightbeam-daemon send <message> tightbeam-daemon check tightbeam-daemon version
No flags. All paths are hardcoded:
- LLM config:
/run/secrets/llm/ - MCP config:
/run/secrets/mcp/ - Socket:
/run/tightbeam/tightbeam.sock - Logs:
/var/log/tightbeam/conversation.ndjson
The orchestrator mounts configs, secrets, and volumes at these paths.
Runtime
The runtime runs inside the container. It connects to the daemon socket, loads the system prompt from /etc/agent/, and enters the agent loop:
- Register with the daemon, then wait for a human message
- Send a
turnwith the message (system prompt and tools are cached from the first turn) - If the LLM returns tool calls, execute them locally and send results in a new
turn - Repeat until
end_turnormax_tokens, then wait for the next human message
Socket Protocol
JSON-RPC 2.0 with length-prefixed binary framing. Each message is preceded by a 4-byte big-endian u32 payload length, followed by the UTF-8 JSON payload.
[4 bytes: u32 big-endian length][payload bytes]
All content fields are arrays of typed blocks:
{"role": "user", "content": [{"type": "text", "text": "Hello"}]}Request: turn
The runtime sends new messages, tool definitions, and optionally a system prompt. System and tools are sent on the first turn and cached by the daemon.
{
"jsonrpc": "2.0",
"id": 1,
"method": "turn",
"params": {
"system": "You are a coding assistant.",
"tools": [{"name": "bash", "description": "Run a command", "parameters": {"type": "object"}}],
"messages": [{"role": "user", "content": [{"type": "text", "text": "What files are in src?"}]}]
}
}Tool results are sent as messages in a subsequent turn:
{
"jsonrpc": "2.0",
"id": 2,
"method": "turn",
"params": {
"messages": [{"role": "tool", "tool_call_id": "tc-001", "content": [{"type": "text", "text": "main.rs\nlib.rs\n"}]}]
}
}Response: Streaming Notifications
No id field — these stream in real time as the LLM generates output.
{"jsonrpc": "2.0", "method": "output", "params": {"stream": "content", "data": {"type": "text", "text": "The src"}}}
{"jsonrpc": "2.0", "method": "output", "params": {"stream": "content", "data": {"type": "text", "text": " directory contains"}}}Response: Final
Has id — signals completion of this turn.
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"stop_reason": "end_turn",
"content": [{"type": "text", "text": "The src directory contains main.rs and lib.rs."}]
}
}When the LLM requests tool calls:
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"stop_reason": "tool_use",
"tool_calls": [{"id": "tc-001", "name": "bash", "input": {"command": "ls src/"}}]
}
}Response: Error
API errors forwarded as-is. Tightbeam does not retry. The agent decides what to do.
{
"jsonrpc": "2.0",
"id": 1,
"error": {"code": 429, "message": "Rate limit exceeded."}
}Connection Handshake
Every connection to an agent socket must identify itself with its first frame:
- Runtime sends a
registerrequest. The daemon enters the runtime handler (turn loop). - CLI / channel adapter sends a
sendrequest. The daemon enters the subscriber handler.
Runtime registration:
{"jsonrpc": "2.0", "method": "register", "params": {"role": "runtime"}}No response. The daemon begins sending human_message notifications when messages arrive.
Request: send
Inject a human message into the agent's conversation. Sent by CLI or channel adapters as the first frame on a new connection.
{
"jsonrpc": "2.0",
"id": 1,
"method": "send",
"params": {
"content": [{"type": "text", "text": "Create a hello world file"}]
}
}Response (agent idle, message delivered immediately):
{"jsonrpc": "2.0", "id": 1, "result": {"status": "delivered"}}Response (agent busy, message queued):
{"jsonrpc": "2.0", "id": 1, "result": {"status": "queued"}}After a queued message is delivered:
{"jsonrpc": "2.0", "method": "delivered"}Response (no runtime connected):
{"jsonrpc": "2.0", "id": 1, "error": {"code": -32000, "message": "runtime not connected"}}File Transfer
The send request supports image delivery via a multi-frame protocol. The content array includes file_incoming blocks alongside text, followed by one raw-byte frame per file.
{
"jsonrpc": "2.0",
"id": 1,
"method": "send",
"params": {
"content": [
{"type": "text", "text": "Describe this image"},
{"type": "file_incoming", "filename": "photo.png", "mime_type": "image/png", "size": 102400}
]
}
}The daemon validates all file_incoming blocks, then writes the RPC response (delivered/queued) before reading any file frames. The CLI reads the response before sending file data. This ordering prevents deadlock.
Each file frame uses the same 4-byte BE u32 length prefix as JSON frames, but the payload is raw bytes (not JSON). Frames are sent in the same order as their file_incoming blocks in the content array.
The daemon base64-encodes each file and replaces the file_incoming block with an image block before delivering to the runtime:
{"type": "image", "media_type": "image/png", "data": "<base64>"}v1 supports: image/png, image/jpeg, image/gif, image/webp. Unsupported MIME types are rejected with an error response. Files larger than 4GB are rejected (framing limit).
Subscriber Notifications
After sending a message, the connection becomes a subscriber. Subscribers receive copies of the agent's text output (not tool calls) and lifecycle events:
Text output (streamed):
{"jsonrpc": "2.0", "method": "output", "params": {"stream": "content", "data": {"type": "text", "text": "Hello"}}}Agent turn complete:
{"jsonrpc": "2.0", "method": "end_turn"}Runtime disconnected:
{"jsonrpc": "2.0", "method": "error", "params": {"message": "agent disconnected"}}Subscribers receive text output only. Tool use events are internal to the runtime and are not broadcast.
Configuration
Agent Config
Config comes from two sources:
LLM — k8s Secret volume mounted at /run/secrets/llm/:
/run/secrets/llm/provider -> "anthropic"
/run/secrets/llm/model -> "claude-sonnet-4-20250514"
/run/secrets/llm/api-key -> "sk-ant-..."
/run/secrets/llm/max-tokens -> "8192" # optional, defaults to 8192
Missing provider, model, or api-key is a hard error. Values are trimmed of whitespace.
MCP — subdirectories of /run/secrets/mcp/:
/run/secrets/mcp/
github/
url # "https://mcp.github.com/sse" (required)
auth_token # "ghp_xxxx" (optional — absent means no auth)
tools # one tool name per line (optional — absent means all)
web-search/
url # "https://mcp.search.example.com/sse"
auth_token # "search-token-xxx"
Each subdirectory is one MCP server. The subdirectory name is the server's logical name. If the directory is empty or doesn't exist, zero MCP servers.
MCP Tool Allowlists
The tools file in each MCP server subdirectory controls which tools the LLM can call:
| Value | Meaning |
|---|---|
| file absent | Allow all tools from the server |
| file present, one name per line | Allow only named tools |
MCP Support
The daemon acts as an MCP client. It connects to remote MCP servers, discovers their tools, and merges them with the runtime's local tools. The LLM sees one flat tool list.
When the LLM returns tool calls, the daemon partitions them:
- All local — returned to the runtime for execution
- All MCP — daemon executes them, appends results to conversation, calls the LLM again. The runtime waits and receives the final response.
- Mixed — daemon executes MCP calls immediately, returns only the local calls to the runtime. When the runtime sends back local results, the daemon interleaves all results in the original call order and continues.
MCP connections are lazy (first turn, not startup) and cached for the session. Auth uses Bearer tokens read from the auth_token file. If a connection drops mid-session, the daemon retries once.
Conversation Ownership
Tightbeam owns the conversation. The runtime is stateless.
- Runtime sends a
turnwith new messages - Daemon logs messages to NDJSON, attaches credentials, forwards to LLM
- Response streams back — daemon logs it, forwards to runtime
- Runtime executes tool calls locally, sends results in next
turn - Daemon logs results, calls LLM again
- Loop continues until
end_turnormax_tokens - External messages arrive via
send— the daemon delivers them ashuman_messagenotifications to the runtime, and the loop resumes from step 1
Filesystem Layout
Each daemon instance serves one agent. All paths are hardcoded:
/run/secrets/llm/ # LLM config (k8s Secret mount)
/run/secrets/mcp/ # MCP server configs (mounted directory)
/run/tightbeam/tightbeam.sock # unix socket (mode 0600)
/var/log/tightbeam/conversation.ndjson # conversation log
The orchestrator mounts secrets and configs at these locations.
Security Model
- LLM credentials are read from k8s Secret volume mounts at
/run/secrets/llm/. MCP credentials are read fromauth_tokenfiles in/run/secrets/mcp/<server>/. - API keys and MCP auth tokens never cross the socket boundary.
- The agent does not know which model or provider it talks to.
- LLM provider is swappable at the config layer without agent changes.
- MCP servers are configured by the daemon. The runtime has no knowledge of MCP.
- All messages (user, assistant, tool results) are logged to NDJSON files.
- Errors are forwarded as-is. Tightbeam does not retry or modify API responses.
- External message delivery (
send) goes through the daemon. The daemon queues messages for a busy agent and tracks subscribers. Subscribers see agent text output only, never tool call internals.