GitHub - kuba-guzik/caveman-micro: 6-line caveman micro prompt (85 tokens) that outperformed the original 552-token skill. Benchmark on Claude Sonnet + Opus included.

3 min read Original article ↗
jakguzik_Split_screen_infographic_style_illustration _Left_si_a0d87b94-f6d0-4aef-80ad-1e6ec4ae2cd6_1

6 lines. 85 tokens. Outperformed the 552-token original.

We benchmarked the viral caveman token-saving prompt on real coding tasks. Then we distilled it into 6 lines that beat the original on both Claude Sonnet and Opus.

The micro prompt

Respond like smart caveman. Cut all filler, keep technical substance.
- Drop articles (a, an, the), filler (just, really, basically, actually).
- Drop pleasantries (sure, certainly, happy to).
- No hedging. Fragments fine. Short synonyms.
- Technical terms stay exact. Code blocks unchanged.
- Pattern: [thing] [action] [reason]. [next step].

Copy it into your system prompt, custom instructions (ChatGPT), or CLAUDE.md. Works with any LLM.

Benchmark results

Tested on real tasks (incident diagnosis, config extraction) with structured JSON output and quality verification.

Claude Sonnet

Group Avg Output Tokens Quality Savings
Baseline ("Be concise") 259 100% --
Caveman full (552 tok) 225 100% 13%
Caveman Micro (85 tok) 223 100% 14%

Claude Opus

Group Avg Output Tokens Quality Savings
Baseline ("Be concise") 227 100% --
Caveman full (552 tok) 207 100% 9%
Caveman Micro (85 tok) 180 100% 21%

Quality: 100% in every run. Zero missing facts. The micro version outperformed the full skill on both models.

Why micro beats full

The model already knows how to be concise. It doesn't need a 552-token tutorial. It needs 6 lines of permission.

A longer instruction set costs tokens to inject and gives the model more noise to process. The short nudge does the same job at one-sixth the injection cost.

When to expect what savings

How you use AI Expected savings
Open-ended chat, explanations, brainstorming 40-65%
Structured work (code, JSON, data extraction) 14-21%
API calls at scale 14-21% (compounds across millions of calls)

The bigger your baseline prompt already optimizes for conciseness, the smaller the additional gain. But since quality stays at 100%, the cost of trying is zero.

The real insight

The biggest token savings come from your base prompt, not from output compression.

"Be concise. Return JSON." does more than any skill layered on top. Caveman micro is the last 14-21% on a foundation that already handles 60%.

Use it

Any LLM -- paste the 6 lines into your system prompt or custom instructions.

Claude Code -- add to your CLAUDE.md:

# Token efficiency
Respond like smart caveman. Cut all filler, keep technical substance.
- Drop articles (a, an, the), filler (just, really, basically, actually).
- Drop pleasantries (sure, certainly, happy to).
- No hedging. Fragments fine. Short synonyms.
- Technical terms stay exact. Code blocks unchanged.
- Pattern: [thing] [action] [reason]. [next step].

ChatGPT -- Settings > Personalization > Custom Instructions > paste the 6 lines.

Cursor / Windsurf -- add to your project rules file.

Run the benchmark yourself

git clone https://github.com/kuba-guzik/caveman-micro.git
cd caveman-micro
npm install
npm run bench:live:dry   # preview the plan
npm run bench:live       # run (requires Claude Code CLI)
npm run bench:analyze    # analyze results

Configuration

Variable Default Description
BENCH_MODEL sonnet Model alias (sonnet, opus, haiku)
BENCH_REPS 3 Repetitions per task x group
BENCH_EXPERIMENT all Filter experiments (comma-separated)
BENCH_TASK all Filter tasks (comma-separated)
BENCH_DRY 0 Set 1 to print plan without running

Raw data

Full benchmark results (JSON, CSV, Markdown) are in bench/results/live/.

Article

Full writeup with methodology and analysis: link coming soon

Credits

License

MIT