Summary
Claude Code versions 2.1.100 and 2.1.101 consume ~20,000 more cache_creation_input_tokens per request than v2.1.98, despite sending fewer bytes in the request payload. The inflation is server-side and version-specific (likely User-Agent routing).
This is not a billing-only issue — these tokens enter the model's context window and may affect output quality.
Reproduction
# 1. Install proxy to capture full API request/response bodies mkdir /tmp/cc-test && cd /tmp/cc-test npx -y claude-code-logger@1.0.2 start --port 8000 --log-body --merge-sse # 2. In another terminal — test with older version export ANTHROPIC_BASE_URL="http://localhost:8000" claude-2.1.98 --print "1+1" # Note cache_creation_input_tokens in response # 3. Same setup — test with newer version claude-2.1.100 --print "1+1" # Note cache_creation_input_tokens in response
Each test is a cold cache, single API call, no session state (--print mode). Same machine, same project, same account, minutes apart.
Evidence
| Version | Content-Length (bytes) | cache_creation_input_tokens | cache_read | Total |
|---|---|---|---|---|
| v2.1.98 | 169,514 | 49,726 | 0 | 49,726 |
| v2.1.100 | 168,536 (-978 B) | 69,922 | 0 | 69,922 |
| v2.1.101 | 171,903 (+2,389 B) | ~72,000 | 0 | ~72,000 |
v2.1.100 sends 978 fewer bytes than v2.1.98 but is billed 20,196 MORE tokens. This rules out any client-side payload difference — the inflation happens server-side after the request is received.
Cross-account test (same v2.1.98, two different Max accounts): delta < 500 tokens = noise. Not account-specific.
Interactive mode confirms
/tmp/claude-cache-*.json files across 40+ sessions show bimodal distribution:
- Group A: ~50K (
cache_read=0, cache_create=~50K) — matches v2.1.98--print - Group B: ~71K (
cache_read=0, cache_create=~71K) — matches v2.1.100+--print
Some sessions start cold at 71K with cache_read=0, confirming this is not accumulated cache — it's the baseline.
Quality concern
The 20K extra tokens are cache_creation_input_tokens — this means they enter the model's context window, not just the billing ledger. If the server injects additional content invisible to the user:
- Instruction dilution: Hidden system content may compete with user-provided CLAUDE.md rules, leading to inconsistent agent behavior
- Reduced effective context: 20K fewer tokens available for actual conversation history — in long sessions this compounds with every turn
- Unverifiable behavior: Users cannot audit what the model actually "sees" vs what they sent — makes debugging agent misbehavior significantly harder
We observed qualitative differences between v2.1.98 and v2.1.100+ sessions (instruction adherence, tool selection accuracy), but cannot isolate whether this is caused by the extra hidden tokens or other version changes. The lack of transparency makes it impossible to tell.
Additional findings
- After
/loginaccount switch, statusline can jump ±20K — this is cache invalidation (new account_uuid = new cache key), not a billing difference between accounts. - 56
count_tokensburst calls observed immediately after first interactive prompt — inflates the apparent "first visible" context number in statusline. - Current v2.1.104 is untested — the investigation was done on v2.1.98/100/101. We cannot confirm whether v2.1.104 carries the same inflation.
Impact
~20K extra tokens per session = ~40% overhead on a clean project. On Max plan with usage limits, this means hitting the 5-hour cap significantly faster on newer CC versions.
Combined with the quality concern: users are paying more for potentially degraded output, with no visibility into why.
Workaround
Downgrade to v2.1.98 or earlier. Check ~/.local/share/claude/versions/ for available binaries, or use npx claude-code@2.1.98.
Note: Auto-updates may overwrite older versions. v2.1.98 is no longer retained in the local versions directory after v2.1.104 installed.
Related
- Phantom ~22K tokens per session on one account vs another — same machine, same project #45515 — Original phantom token report (account-specific delta, now understood as cache invalidation artifact)
- Reddit investigation with full proxy data: https://www.reddit.com/r/ClaudeCode/comments/1sj10ou/
- Community reports of rapid limit exhaustion correlate with v2.1.100+ rollout timeline
Environment
- Claude Code: v2.1.98 / v2.1.100 / v2.1.101 (tested all three)
- OS: Linux (WSL2), Windows 11
- Plan: Max (5x)
- Install method: native
- Measurement: HTTP proxy capturing full request/response bodies