v0.2.1 — available now
Pruner runs silently in the background while you use Claude Code, automatically reducing what you spend on every API call — without changing how Claude behaves.
$ curl -fsSL https://raw.githubusercontent.com/OneGoToAI/Pruner/main/install.sh | bash
macOS (Apple Silicon & Intel) · Linux x64 · requires Claude Code CLI
Works in 30 seconds
No config files. No API keys to manage. No code changes.
$ curl -fsSL .../install.sh | bash
One command. No Node.js or npm required. Self-contained binary under 20 MB.
2
Replace claude
All Claude flags work identically. --resume, -p, everything.
3
Watch savings
After each Claude response, Pruner prints exactly how much it saved — verified by Anthropic's own tokenizer.
4-Layer Intelligent Optimization
Advanced context optimization that goes beyond simple truncation, applied in real-time before each request.
Smart Context Optimization
4-layer intelligence: tool-aware truncation, distance decay, content deduplication, and three-tier LLM-powered summaries. Each tool type gets different treatment based on information density and re-retrievability.
dedup · tool policies · LLM summaries · distance decay
Prompt Cache Injection
Anthropic's prompt cache cuts repeated input costs by 90%. Pruner automatically
injects cache_control on large system prompts so you get cache hits without any code changes.
cache_read: $0.30/M vs input: $3.00/M
Verified Savings
Savings figures use Anthropic's own count_tokens API and
actual usage.input_tokens from each response —
not estimates. What Pruner shows matches your bill.
✓ verified · ~estimated (fallback)
Your code never leaves your machine
Pruner is a local-only proxy. Your prompts, API key, and codebase flow exactly one place: directly to api.anthropic.com — the same destination as without Pruner.
Binds only to localhost
The proxy listens exclusively on 127.0.0.1. It is not accessible from your local network, your router, or the internet.
Only talks to Anthropic
Zero telemetry. Zero analytics. No Pruner backend exists. Every outbound byte goes to api.anthropic.com:443 — nothing else.
API key never stored
Your Anthropic API key is forwarded in-memory, transparently — identical to how Claude CLI handles it. Pruner never writes it to disk or logs it.
Open source & auditable
Every line of code is on GitHub under the MIT license. Read it, audit it, or compile the binary yourself — the output is bit-for-bit identical.
Don't trust us — verify it yourself
Run pruner --debug to see a live log of every outbound connection. Or use your OS independently:
# Pruner's built-in audit log
$
pruner --debug
→ api.anthropic.com:443
✗ no other connections
# Independent OS verification
$
sudo lsof -i -n -P | grep pruner
pruner → api.anthropic.com:443
(only one remote address)
Commands
Every Claude flag works. A few extras.
Install
No Node.js, no npm, no dependencies. Single binary.
🐚
curl (recommended)
Works on macOS and Linux. Detects your architecture automatically.
$ curl -fsSL https://raw.githubusercontent.com/OneGoToAI/Pruner/main/install.sh | bash
macOS only. Easier to update later with brew upgrade pruner.
$ brew install OneGoToAI/tap/pruner
Requirements
- macOS (Apple Silicon or Intel) or Linux x64
- Claude Code CLI installed and logged in
- That's it — no Node.js, no Python, no other dependencies
Configuration
Run pruner config to open ~/.pruner/config.json.
Changes take effect immediately — no restart required.
Frequently asked
Does Pruner see my API key or code?
Pruner is a local-only proxy — it only listens on 127.0.0.1 and only connects to api.anthropic.com. Your API key is forwarded transparently and never stored.
You can verify this yourself by running pruner --debug, which prints every outbound connection, or by inspecting with:
sudo lsof -i -n -P | grep pruner
Will it change Claude's behavior or break my workflow?
Claude's responses are never touched — Pruner only modifies what you send to Anthropic, not what comes back.
If Claude's context window feels different after aggressive pruning, you can tune maxMessages up in the config, or disable context pruning entirely while keeping prompt cache injection active.
How accurate are the savings figures?
Numbers marked ✓ verified come directly from Anthropic:
- Before token count — from Anthropic's
/v1/messages/count_tokensAPI, called in parallel (zero latency impact) - After token count — from
usage.input_tokensin every API response - Cache savings — from
cache_read_input_tokensin every API response
If the count_tokens call fails (network timeout etc.), Pruner falls back to a tiktoken estimate and marks it ~estimated.
Does Pruner add latency to my requests?
Practically zero. The proxy overhead is <1ms. The count_tokens API call runs in parallel with the main request — Claude's generation (typically 3–30 seconds) takes far longer than the token count call.
Start saving today
One command. Zero config. Real savings.
$ curl -fsSL https://raw.githubusercontent.com/OneGoToAI/Pruner/main/install.sh | bash
Find it useful? A ⭐ helps others discover Pruner.
Star on GitHub