Tamp — Cut your AI coding costs in half

9 min read Original article ↗

v0.8.0 zip-like 1–9 compression levels + Codex ChatGPT + Kimi CLI support Token proxy for coding agents

45% fewer tokens.

100% quality retention across 216 A/B tasks. Zero code changes.

Works with Claude Code Codex CLI Cursor Cline opencode Aider Kimi OpenClaw

sanches.free "Saved 30% of tokens IRL — $9.30 in one session" @sanches_free

or auto-start via the Claude Code plugin

Every turn costs more than the last.

Coding agents re-send the full conversation on every API call. Tool results accumulate — file reads, JSON configs, CLI output — all re-sent as input tokens, every single turn.

Turn 1 Turn 10 Turn 20 Turn 30 80K 100K+

200+

API calls per session
each sends full history

60%

is tool_result bloat
JSON, files, CLI output

$6–15

per coding session
at $3/Mtok pricing

17 stages. Zero config.

Tamp sits between your agent and the API. It classifies each tool result and applies the right compression — automatically.

Claude Code / Codex CLI / opencode / Aider / Cursor / Cline tamp:7778 Anthropic / OpenAI / Gemini

  • cmd-stripnew lossless Strip progress bars and spinners from npm, pip, cargo, docker, git, pytest output.
  • minify lossless Strip JSON whitespace. package.json shrinks 22%.
  • toon lossless Columnar encoding for arrays. File listings shrink 49%.
  • prune lossless Strip lockfile hashes, registry URLs, npm metadata. -81% lockfiles
  • dedup lossless Same file read twice? Send a reference, not the content.
  • diff lossless Tiny edit? Send a patch, not the full file.
  • read-diffnew lossless Agent re-reads a file? Emit a unified diff vs the prior copy. Session-scoped cache.
  • strip-lines lossless Remove line-number prefixes from Read tool output.
  • whitespace lossless Collapse blank lines, trim trailing spaces.
  • llmlingua neural LLMLingua-2 token pruning for text. Auto-starts sidecar. -40% source

Opt-in stages (not enabled by default)

  • strip-comments opt-in Remove //, /* */, # comments. -35% commented
  • textpress opt-in LLM semantic compression via Ollama or OpenRouter. -73% stacktraces
  • graph opt-in Session-scoped dedup. Works on any coding agent — Codex, Claude Code, Aider — anywhere the same file is read twice. -99% repeats
  • br-cachenew opt-in Brotli disk store under ~/.cache/tamp/br/. Offloads entries >8KB for persistence + cross-session rehydration. Lossless.
  • disclosurenew lossy 3-tier summary for tool_result bodies >32KB. Emits a <tamp-ref:v1:HASH:BYTES> marker; model quotes it back to rehydrate. Aggressive only, skipped on dangerous tasks.
  • bm25-trimnew lossy Pure-JS BM25 ranker. Bodies >64KB get lines scored against the last user message; drops low-score lines at a 4096-token budget. Aggressive only, skipped on dangerous tasks.

Works with Claude Code, Codex CLI, opencode, Aider, Cursor, Cline, and any OpenAI-compatible agent.

One knob. Nine stops.

Pick a compression level like you pick a zip level. Each step adds stages on top of the previous — no need to memorize 17 names.

Level Adds Lossy Savings Preset
L1minify~15%
L2+ whitespace, strip-lines~25%
L3+ cmd-strip~35%
L4+ toon, dedup, diff~45%conservative
L5+ llmlingua, read-diff, pruneyes~53%balanced (default)
L6+ strip-commentsyes~58%
L7+ textpress, br-cacheyes~62%
L8+ disclosure, bm25-trimyes~67%aggressive
L9+ graph, foundation-modelsyes~72%max

Environment

TAMP_LEVEL=7 tamp

Interactive

tamp settings

Precedence: --level > TAMP_LEVEL > config file > preset alias > default (L5).

Wire it up in 30 seconds.

One tamp instance, any coding agent. Point the base URL at localhost:7778 and go.

Anthropic API format — /v1/messages. Works with BYOK and Claude Console OAuth (Pro/Max plans) — tamp forwards the bearer verbatim.

export ANTHROPIC_BASE_URL=http://localhost:7778
claude

Lifecycle & health

tamp stop gracefully shuts down a running proxy · tamp -y --force replaces an existing instance · tamp status checks health · curl http://localhost:7778/caveman-help returns the current output mode + classifier rules. If the terminal dies mid-session, the PID file at ~/.config/tamp/tamp-<port>.pid + SIGHUP handler release the port automatically.

🦞 Works with OpenClaw

Route your AI gateway through Tamp. Every request gets compressed before it hits Anthropic — your agents work the same, your bill doesn't.

Setup in 2 minutes

  1. Run npm i -g @sliday/tamp && tamp -y on your server
  2. Add a provider in your OpenClaw config pointing to http://localhost:7778
  3. Set it as primary model — done. All requests now flow through Tamp.

Chat sessions (Telegram, short turns)

3–5%

mostly text, few tool calls

Coding sessions (file reads, JSON)

30–50%

heavy tool_result compression

70MB RAM. <5ms latency. No Python needed. If Tamp goes down, requests bypass it automatically.

Measured: 45% fewer tokens, 100% quality

A/B tested via OpenRouter with Sonnet Haiku 4.5 as judge. Twelve scenarios, 216 live A/B tasks at level 5 (default). Zero quality regressions.

21.3%

62.3%

60.0%

0.0%

18.4%

78.9%

9.8%

Small JSON Large JSON Tabular Source Code Multi-turn Lockfile Dedup Read

Per Session $0.68 input + output combined

Per Developer $75/mo 5 sessions/day

10-Person Team $9,000/yr free and open source

Read the white paper Reproduce the benchmark

Quality verified: 8/8 A/B scenarios — compressed responses identical to uncompressed. Sonnet 4.6, $3/$15 MTok in/out.

sanches.free

"NOT BAD IRL — save 30% of tokens. 7,681 blocks compressed, 3M tokens saved, $9.30 back in my pocket."

[tamp] session 7323.9k chars, 3099180 tokens saved (28.8% avg) $9.2975 saved @ $3/Mtok

Azamat Sultanov

"Works perfectly, saving a bunch of tokens daily, and that’s even without using the LM-based optimisers."

[tamp] 1,107 requests, 77 blocks compressed, 26,847 tokens saved (11.4% avg, no llmlingua)

Claude Max? Last 2× longer.

Max subscribers have a fixed token budget. Tamp compresses input tokens before they count against your limit — same work, fewer tokens consumed.

Max 5× — $100/mo

10.6×

Max 20× — $200/mo

20× 42.2×

Model In/Out $/MTok Saved/session Per dev/month Team/year
Sonnet 4.6 $3/$15 $0.68 $75 $9,000
Opus 4.6 $15/$75 $3.39 $373 $44,700
Opus 4.6 (extended) $15/$75 $3.39 $373 $44,700

Projection at level 5 default (45% input savings, lossless, 100% quality). Caveman output mode adds a further 20–30pp on output tokens. 10-person team, 5 sessions/day.

Output tokens too.

Tamp compresses input. But Claude also wastes output tokens on filler — "Sure!", "Great question!", unsolicited advice. The Tamp plugin auto-injects token-efficient rules into your project’s CLAUDE.md to cut that too.

66.2% fewer output tokens

A/B tested: 8 scenarios, 5 runs each, 80 API calls via OpenRouter with Sonnet 4.6

Before (508 tokens)

“Great question! I'd be happy to help you with this code review. Looking at your loop, I can see there's an interesting issue here. The problem is that you have an off-by-one error on line 12 where i <= arr.length should be i < arr.length. This is a common mistake that many developers make. I hope this helps! Let me know if you have any other questions.”

After (262 tokens, −48.5%)

“Off-by-one error: i <= arr.length should be i < arr.length. Array index goes out of bounds on last iteration.”

Before (95 tokens)

“You're absolutely right that Python was created in 2005! Actually, I should mention that while your enthusiasm is great, Python was actually created by Guido van Rossum and first released in 1991. It's a common misconception though!”

After (24 tokens, −74.7%)

“Python was created by Guido van Rossum, first released in 1991, not 2005.”

Scenario Control Treatment Reduction
Code Review 508 262 48.5%
Concept Explanation 569 327 42.6%
Factual Correction 95 24 74.7%
Refactor Suggestion 139 30 78.5%
Debug Assistance 483 137 71.6%
API Usage 609 246 59.6%
Git Command 334 61 81.6%
Code Generation 922 151 83.7%

Inspired by drona23/claude-token-efficient. More examples →

Caveman Mode Evaluation

Conditional — Safe for simple tasks only

⚠️ Evaluation Results: Not Safe for Production

Comprehensive testing shows Caveman Mode achieves 40–70% output token savings but introduces critical risks for security fixes, debugging, and architectural decisions.

What is Caveman Mode?

An extreme output compression approach that makes Claude Code's responses more token-efficient by:

  • Executing tools before speaking (tool-first)
  • Removing preamble ("Let me...", "I will...")
  • Using sentence fragments
  • Stopping immediately after results
  • Never echoing user input

Task-Type Safety Assessment

Task Type Savings Safe?
Env var additions 80% ✅ Yes
Typos 95% ✅ Yes
Documentation 85% ✅ Yes
New features (trivial) 70% ⚠️ Conditional
Simple refactors 65% ⚠️ Conditional
Debugging 40% ❌ No
Security fixes 50% ❌ No
Performance 45% ❌ No
Architecture 35% ❌ No

Recommended Approach: Task-Type-Aware Compression

Instead of blanket Caveman Mode, use a hybrid approach:

  • Safe zones: Apply full compression to env vars, typos, docs (78% average savings)
  • Danger zones: Use full output for security, debugging, performance, architecture

This achieves 64% overall savings without breaking critical workflows.

Example Failure: Security Fix

❌ Caveman Mode (56% savings)

SSRF found: llmLinguaUrl unvalidated
Fixed: Added localhost-only hostname check
Tests: ✓

✅ Normal Mode

Found vulnerability: SSRF via
config.llmLinguaUrl (compress.js:273).
URL concatenated without validation...

Fix: Added hostname validation to
ensure llmLinguaUrl only points to
localhost (127.0.0.1, ::1).

Edge cases: IPv6 variants handled,
hostname spoofing prevented.

Tests: ✓ Commit: d631c4f

Problem: User can't verify attack vector, IPv6 handling, or hostname spoofing prevention.

See the full evaluation at github.com/sliday/tamp/tree/main/bench

Caveman-Inspired Features

Task-type-aware compression, presets, opt-in output rules

Tamp now integrates the best ideas from JuliusBrussee/caveman for full-spectrum token optimization: input compression (Tamp's strength) + output compression (Caveman's strength).

Compression Presets

Three intensity levels simplify configuration—no need to memorize 10 stage names:

🛡️ Conservative

45–50% Lossless only

  • Cmd-strip, minify, toon, strip-lines
  • Whitespace, dedup, diff
  • No neural compression

Default

⚖️ Balanced

52–58% Recommended

  • All conservative stages
  • LLMLingua neural compression
  • Prune lockfile metadata
  • Read-diff (re-read deltas)

🚀 Aggressive

65–72% Maximum

  • All balanced stages
  • Strip code comments
  • Textpress LLM compression
  • Br-cache disk store
  • Disclosure + bm25-trim (safe tasks)

💡 Usage

# Environment variable
export TAMP_COMPRESSION_PRESET=balanced

# Config file (~/.config/tamp/config)
TAMP_COMPRESSION_PRESET=balanced  # conservative | balanced | aggressive

Task-Type-Aware Output Compression

Tamp injects token-efficient rules into every request based on the user's intent. Safe tasks (env vars, typos, docs) get compressed output; dangerous tasks (security, debugging) get full output. Opt in via TAMP_OUTPUT_MODE=balanced — default is off so existing users see no behavior change.

Mode Safe Tasks Dangerous
Conservative 40-50% 40-50%
Balanced 65-75% Full output
Aggressive 75-85% Partial

CLI: Compress Config Files

New tamp compress-config tool compresses CLAUDE.md and config files by 40-45% using Tamp's compression pipeline.

# Dry run (preview savings)
tamp compress-config --dry-run ~/.claude/CLAUDE.md

# Compress with backup
tamp compress-config ~/.claude/CLAUDE.md

# Compress multiple files
tamp compress-config ~/.config/tamp/config ~/.claude/CLAUDE.md

🎯 Combined Impact

With Caveman-inspired features integrated, Tamp now provides 60-70% combined token savings (input + output) in balanced mode—full-spectrum optimization without sacrificing quality on critical tasks.

Star History

Star History Chart

One command. Zero config.

Point your agent at localhost:7778 and go.

Or install the Claude Code plugin: claude plugin marketplace add sliday/claude-plugins && claude plugin install tamp@sliday