v0.8.0 zip-like 1–9 compression levels + Codex ChatGPT + Kimi CLI support → Token proxy for coding agents
45% fewer tokens.
100% quality retention across 216 A/B tasks. Zero code changes.
Works with Claude Code Codex CLI Cursor Cline opencode Aider Kimi OpenClaw
"Saved 30% of tokens IRL — $9.30 in one session"
— @sanches_free
or auto-start via the Claude Code plugin →
Every turn costs more than the last.
Coding agents re-send the full conversation on every API call. Tool results accumulate — file reads, JSON configs, CLI output — all re-sent as input tokens, every single turn.
200+
API calls per session
each sends full history
60%
is tool_result bloat
JSON, files, CLI output
$6–15
per coding session
at $3/Mtok pricing
17 stages. Zero config.
Tamp sits between your agent and the API. It classifies each tool result and applies the right compression — automatically.
Claude Code / Codex CLI / opencode / Aider / Cursor / Cline → tamp:7778 → Anthropic / OpenAI / Gemini
- cmd-stripnew lossless Strip progress bars and spinners from npm, pip, cargo, docker, git, pytest output.
-
minify
lossless
Strip JSON whitespace.
package.jsonshrinks 22%. - toon lossless Columnar encoding for arrays. File listings shrink 49%.
- prune lossless Strip lockfile hashes, registry URLs, npm metadata. -81% lockfiles
- dedup lossless Same file read twice? Send a reference, not the content.
- diff lossless Tiny edit? Send a patch, not the full file.
- read-diffnew lossless Agent re-reads a file? Emit a unified diff vs the prior copy. Session-scoped cache.
- strip-lines lossless Remove line-number prefixes from Read tool output.
- whitespace lossless Collapse blank lines, trim trailing spaces.
- llmlingua neural LLMLingua-2 token pruning for text. Auto-starts sidecar. -40% source
Opt-in stages (not enabled by default)
-
strip-comments
opt-in
Remove
//,/* */,#comments. -35% commented - textpress opt-in LLM semantic compression via Ollama or OpenRouter. -73% stacktraces
- graph opt-in Session-scoped dedup. Works on any coding agent — Codex, Claude Code, Aider — anywhere the same file is read twice. -99% repeats
-
br-cachenew
opt-in
Brotli disk store under
~/.cache/tamp/br/. Offloads entries >8KB for persistence + cross-session rehydration. Lossless. -
disclosurenew
lossy
3-tier summary for tool_result bodies >32KB. Emits a
<tamp-ref:v1:HASH:BYTES>marker; model quotes it back to rehydrate. Aggressive only, skipped on dangerous tasks. - bm25-trimnew lossy Pure-JS BM25 ranker. Bodies >64KB get lines scored against the last user message; drops low-score lines at a 4096-token budget. Aggressive only, skipped on dangerous tasks.
Works with Claude Code, Codex CLI, opencode, Aider, Cursor, Cline, and any OpenAI-compatible agent.
One knob. Nine stops.
Pick a compression level like you pick a zip level. Each step adds stages on top of the previous — no need to memorize 17 names.
| Level | Adds | Lossy | Savings | Preset |
|---|---|---|---|---|
| L1 | minify | — | ~15% | — |
| L2 | + whitespace, strip-lines | — | ~25% | — |
| L3 | + cmd-strip | — | ~35% | — |
| L4 | + toon, dedup, diff | — | ~45% | conservative |
| L5 | + llmlingua, read-diff, prune | yes | ~53% | balanced (default) |
| L6 | + strip-comments | yes | ~58% | — |
| L7 | + textpress, br-cache | yes | ~62% | — |
| L8 | + disclosure, bm25-trim | yes | ~67% | aggressive |
| L9 | + graph, foundation-models | yes | ~72% | max |
Environment
TAMP_LEVEL=7 tamp
Interactive
tamp settings
Precedence: --level > TAMP_LEVEL > config file > preset alias > default (L5).
Wire it up in 30 seconds.
One tamp instance, any coding agent. Point the base URL at localhost:7778 and go.
Anthropic API format — /v1/messages. Works with BYOK and Claude Console OAuth (Pro/Max plans) — tamp forwards the bearer verbatim.
export ANTHROPIC_BASE_URL=http://localhost:7778 claude
Lifecycle & health
tamp stop gracefully shuts down a running proxy ·
tamp -y --force replaces an existing instance ·
tamp status checks health ·
curl http://localhost:7778/caveman-help returns the current output mode + classifier rules.
If the terminal dies mid-session, the PID file at ~/.config/tamp/tamp-<port>.pid
+ SIGHUP handler release the port automatically.
🦞 Works with OpenClaw
Route your AI gateway through Tamp. Every request gets compressed before it hits Anthropic — your agents work the same, your bill doesn't.
Setup in 2 minutes
- Run
npm i -g @sliday/tamp && tamp -yon your server - Add a provider in your OpenClaw config pointing to
http://localhost:7778 - Set it as primary model — done. All requests now flow through Tamp.
Chat sessions (Telegram, short turns)
3–5%
mostly text, few tool calls
Coding sessions (file reads, JSON)
30–50%
heavy tool_result compression
70MB RAM. <5ms latency. No Python needed. If Tamp goes down, requests bypass it automatically.
Measured: 45% fewer tokens, 100% quality
A/B tested via OpenRouter with Sonnet Haiku 4.5 as judge. Twelve scenarios, 216 live A/B tasks at level 5 (default). Zero quality regressions.
21.3%
62.3%
60.0%
0.0%
18.4%
78.9%
9.8%
Small JSON Large JSON Tabular Source Code Multi-turn Lockfile Dedup Read
Per Session $0.68 input + output combined
Per Developer $75/mo 5 sessions/day
10-Person Team $9,000/yr free and open source
Read the white paper Reproduce the benchmark
Quality verified: 8/8 A/B scenarios — compressed responses identical to uncompressed. Sonnet 4.6, $3/$15 MTok in/out.

"NOT BAD IRL — save 30% of tokens. 7,681 blocks compressed, 3M tokens saved, $9.30 back in my pocket."
[tamp] session 7323.9k chars, 3099180 tokens saved (28.8% avg) $9.2975 saved @ $3/Mtok

"Works perfectly, saving a bunch of tokens daily, and that’s even without using the LM-based optimisers."
[tamp] 1,107 requests, 77 blocks compressed, 26,847 tokens saved (11.4% avg, no llmlingua)
Claude Max? Last 2× longer.
Max subscribers have a fixed token budget. Tamp compresses input tokens before they count against your limit — same work, fewer tokens consumed.
Max 5× — $100/mo
5× → 10.6×
Max 20× — $200/mo
20× → 42.2×
| Model | In/Out $/MTok | Saved/session | Per dev/month | Team/year |
|---|---|---|---|---|
| Sonnet 4.6 | $3/$15 | $0.68 | $75 | $9,000 |
| Opus 4.6 | $15/$75 | $3.39 | $373 | $44,700 |
| Opus 4.6 (extended) | $15/$75 | $3.39 | $373 | $44,700 |
Projection at level 5 default (45% input savings, lossless, 100% quality). Caveman output mode adds a further 20–30pp on output tokens. 10-person team, 5 sessions/day.
Output tokens too.
Tamp compresses input. But Claude also wastes output tokens on filler — "Sure!", "Great question!", unsolicited advice. The Tamp plugin auto-injects token-efficient rules into your project’s CLAUDE.md to cut that too.
66.2% fewer output tokens
A/B tested: 8 scenarios, 5 runs each, 80 API calls via OpenRouter with Sonnet 4.6
Before (508 tokens)
“Great question! I'd be happy to help you with this code review. Looking at your loop, I can see there's an interesting issue here. The problem is that you have an off-by-one error on line 12 where i <= arr.length should be i < arr.length. This is a common mistake that many developers make. I hope this helps! Let me know if you have any other questions.”
After (262 tokens, −48.5%)
“Off-by-one error: i <= arr.length should be i < arr.length. Array index goes out of bounds on last iteration.”
Before (95 tokens)
“You're absolutely right that Python was created in 2005! Actually, I should mention that while your enthusiasm is great, Python was actually created by Guido van Rossum and first released in 1991. It's a common misconception though!”
After (24 tokens, −74.7%)
“Python was created by Guido van Rossum, first released in 1991, not 2005.”
| Scenario | Control | Treatment | Reduction |
|---|---|---|---|
| Code Review | 508 | 262 | 48.5% |
| Concept Explanation | 569 | 327 | 42.6% |
| Factual Correction | 95 | 24 | 74.7% |
| Refactor Suggestion | 139 | 30 | 78.5% |
| Debug Assistance | 483 | 137 | 71.6% |
| API Usage | 609 | 246 | 59.6% |
| Git Command | 334 | 61 | 81.6% |
| Code Generation | 922 | 151 | 83.7% |
Inspired by drona23/claude-token-efficient. More examples →
Caveman Mode Evaluation
Conditional — Safe for simple tasks only
⚠️ Evaluation Results: Not Safe for Production
Comprehensive testing shows Caveman Mode achieves 40–70% output token savings but introduces critical risks for security fixes, debugging, and architectural decisions.
What is Caveman Mode?
An extreme output compression approach that makes Claude Code's responses more token-efficient by:
- Executing tools before speaking (tool-first)
- Removing preamble ("Let me...", "I will...")
- Using sentence fragments
- Stopping immediately after results
- Never echoing user input
Task-Type Safety Assessment
| Task Type | Savings | Safe? |
|---|---|---|
| Env var additions | 80% | ✅ Yes |
| Typos | 95% | ✅ Yes |
| Documentation | 85% | ✅ Yes |
| New features (trivial) | 70% | ⚠️ Conditional |
| Simple refactors | 65% | ⚠️ Conditional |
| Debugging | 40% | ❌ No |
| Security fixes | 50% | ❌ No |
| Performance | 45% | ❌ No |
| Architecture | 35% | ❌ No |
Recommended Approach: Task-Type-Aware Compression
Instead of blanket Caveman Mode, use a hybrid approach:
- Safe zones: Apply full compression to env vars, typos, docs (78% average savings)
- Danger zones: Use full output for security, debugging, performance, architecture
This achieves 64% overall savings without breaking critical workflows.
Example Failure: Security Fix
❌ Caveman Mode (56% savings)
SSRF found: llmLinguaUrl unvalidated Fixed: Added localhost-only hostname check Tests: ✓
✅ Normal Mode
Found vulnerability: SSRF via config.llmLinguaUrl (compress.js:273). URL concatenated without validation... Fix: Added hostname validation to ensure llmLinguaUrl only points to localhost (127.0.0.1, ::1). Edge cases: IPv6 variants handled, hostname spoofing prevented. Tests: ✓ Commit: d631c4f
Problem: User can't verify attack vector, IPv6 handling, or hostname spoofing prevention.
See the full evaluation at github.com/sliday/tamp/tree/main/bench
Caveman-Inspired Features
Task-type-aware compression, presets, opt-in output rules
Tamp now integrates the best ideas from JuliusBrussee/caveman for full-spectrum token optimization: input compression (Tamp's strength) + output compression (Caveman's strength).
Compression Presets
Three intensity levels simplify configuration—no need to memorize 10 stage names:
🛡️ Conservative
45–50% Lossless only
- Cmd-strip, minify, toon, strip-lines
- Whitespace, dedup, diff
- No neural compression
Default
⚖️ Balanced
52–58% Recommended
- All conservative stages
- LLMLingua neural compression
- Prune lockfile metadata
- Read-diff (re-read deltas)
🚀 Aggressive
65–72% Maximum
- All balanced stages
- Strip code comments
- Textpress LLM compression
- Br-cache disk store
- Disclosure + bm25-trim (safe tasks)
💡 Usage
# Environment variable export TAMP_COMPRESSION_PRESET=balanced # Config file (~/.config/tamp/config) TAMP_COMPRESSION_PRESET=balanced # conservative | balanced | aggressive
Task-Type-Aware Output Compression
Tamp injects token-efficient rules into every request based on the user's intent. Safe tasks (env vars, typos, docs) get compressed output; dangerous tasks (security, debugging) get full output. Opt in via TAMP_OUTPUT_MODE=balanced — default is off so existing users see no behavior change.
| Mode | Safe Tasks | Dangerous |
|---|---|---|
| Conservative | 40-50% | 40-50% |
| Balanced | 65-75% | Full output |
| Aggressive | 75-85% | Partial |
CLI: Compress Config Files
New tamp compress-config tool compresses CLAUDE.md and config files by 40-45% using Tamp's compression pipeline.
# Dry run (preview savings) tamp compress-config --dry-run ~/.claude/CLAUDE.md # Compress with backup tamp compress-config ~/.claude/CLAUDE.md # Compress multiple files tamp compress-config ~/.config/tamp/config ~/.claude/CLAUDE.md
🎯 Combined Impact
With Caveman-inspired features integrated, Tamp now provides 60-70% combined token savings (input + output) in balanced mode—full-spectrum optimization without sacrificing quality on critical tasks.
One command. Zero config.
Point your agent at localhost:7778 and go.
Or install the Claude Code plugin: claude plugin marketplace add sliday/claude-plugins && claude plugin install tamp@sliday