Pruner — Cut your Claude Code bill by 70%

v0.2.1 — available now

Pruner runs silently in the background while you use Claude Code, automatically reducing what you spend on every API call — without changing how Claude behaves.

$ curl -fsSL https://raw.githubusercontent.com/OneGoToAI/Pruner/main/install.sh | bash

Star on GitHub

macOS (Apple Silicon & Intel) · Linux x64 · requires Claude Code CLI

Works in 30 seconds

No config files. No API keys to manage. No code changes.

$ curl -fsSL .../install.sh | bash

One command. No Node.js or npm required. Self-contained binary under 20 MB.

Replace `claude`

All Claude flags work identically. --resume, -p, everything.

Watch savings

After each Claude response, Pruner prints exactly how much it saved — verified by Anthropic's own tokenizer.

4-Layer Intelligent Optimization

Advanced context optimization that goes beyond simple truncation, applied in real-time before each request.

Smart Context Optimization

4-layer intelligence: tool-aware truncation, distance decay, content deduplication, and three-tier LLM-powered summaries. Each tool type gets different treatment based on information density and re-retrievability.

dedup · tool policies · LLM summaries · distance decay

Prompt Cache Injection

Anthropic's prompt cache cuts repeated input costs by 90%. Pruner automatically injects cache_control on large system prompts so you get cache hits without any code changes.

cache_read: $0.30/M vs input: $3.00/M

Verified Savings

Savings figures use Anthropic's own count_tokens API and actual usage.input_tokens from each response — not estimates. What Pruner shows matches your bill.

✓ verified · ~estimated (fallback)

Your code never leaves your machine

Pruner is a local-only proxy. Your prompts, API key, and codebase flow exactly one place: directly to api.anthropic.com — the same destination as without Pruner.

Binds only to localhost

The proxy listens exclusively on 127.0.0.1. It is not accessible from your local network, your router, or the internet.

Only talks to Anthropic

Zero telemetry. Zero analytics. No Pruner backend exists. Every outbound byte goes to api.anthropic.com:443 — nothing else.

API key never stored

Your Anthropic API key is forwarded in-memory, transparently — identical to how Claude CLI handles it. Pruner never writes it to disk or logs it.

Open source & auditable

Every line of code is on GitHub under the MIT license. Read it, audit it, or compile the binary yourself — the output is bit-for-bit identical.

Don't trust us — verify it yourself

Run pruner --debug to see a live log of every outbound connection. Or use your OS independently:

# Pruner's built-in audit log

pruner --debug

→ api.anthropic.com:443

✗ no other connections

# Independent OS verification

sudo lsof -i -n -P | grep pruner

pruner → api.anthropic.com:443

(only one remote address)

Commands

Every Claude flag works. A few extras.

Install

No Node.js, no npm, no dependencies. Single binary.

🐚

curl (recommended)

Works on macOS and Linux. Detects your architecture automatically.

$ curl -fsSL https://raw.githubusercontent.com/OneGoToAI/Pruner/main/install.sh | bash

macOS only. Easier to update later with brew upgrade pruner.

$ brew install OneGoToAI/tap/pruner

Requirements

macOS (Apple Silicon or Intel) or Linux x64
Claude Code CLI installed and logged in
That's it — no Node.js, no Python, no other dependencies

Configuration

Run pruner config to open ~/.pruner/config.json.

Changes take effect immediately — no restart required.

Frequently asked

Does Pruner see my API key or code?

Pruner is a local-only proxy — it only listens on 127.0.0.1 and only connects to api.anthropic.com. Your API key is forwarded transparently and never stored.

You can verify this yourself by running pruner --debug, which prints every outbound connection, or by inspecting with:

sudo lsof -i -n -P | grep pruner

Will it change Claude's behavior or break my workflow?

Claude's responses are never touched — Pruner only modifies what you send to Anthropic, not what comes back.

If Claude's context window feels different after aggressive pruning, you can tune maxMessages up in the config, or disable context pruning entirely while keeping prompt cache injection active.

How accurate are the savings figures?

Numbers marked ✓ verified come directly from Anthropic:

Before token count — from Anthropic's /v1/messages/count_tokens API, called in parallel (zero latency impact)
After token count — from usage.input_tokens in every API response
Cache savings — from cache_read_input_tokens in every API response

If the count_tokens call fails (network timeout etc.), Pruner falls back to a tiktoken estimate and marks it ~estimated.

Does Pruner add latency to my requests?

Practically zero. The proxy overhead is <1ms. The count_tokens API call runs in parallel with the main request — Claude's generation (typically 3–30 seconds) takes far longer than the token count call.

Start saving today

One command. Zero config. Real savings.

$ curl -fsSL https://raw.githubusercontent.com/OneGoToAI/Pruner/main/install.sh | bash

Find it useful? A ⭐ helps others discover Pruner.

Star on GitHub

Works in 30 seconds

Replace claude

Watch savings

4-Layer Intelligent Optimization

Smart Context Optimization

Prompt Cache Injection

Verified Savings

Your code never leaves your machine

Binds only to localhost

Only talks to Anthropic

API key never stored

Open source & auditable

Don't trust us — verify it yourself

Commands

Install

curl (recommended)

Requirements

Configuration

Frequently asked

Start saving today

Replace `claude`