CC v2.1.100+ inflates cache_creation by ~20K tokens vs v2.1.98 — same payload, server-side

Summary

Claude Code versions 2.1.100 and 2.1.101 consume ~20,000 more cache_creation_input_tokens per request than v2.1.98, despite sending fewer bytes in the request payload. The inflation is server-side and version-specific (likely User-Agent routing).

This is not a billing-only issue — these tokens enter the model's context window and may affect output quality.

Reproduction

# 1. Install proxy to capture full API request/response bodies
mkdir /tmp/cc-test && cd /tmp/cc-test
npx -y claude-code-logger@1.0.2 start --port 8000 --log-body --merge-sse

# 2. In another terminal — test with older version
export ANTHROPIC_BASE_URL="http://localhost:8000"
claude-2.1.98 --print "1+1"
# Note cache_creation_input_tokens in response

# 3. Same setup — test with newer version
claude-2.1.100 --print "1+1"
# Note cache_creation_input_tokens in response

Each test is a cold cache, single API call, no session state (--print mode). Same machine, same project, same account, minutes apart.

Evidence

Version	Content-Length (bytes)	cache_creation_input_tokens	Total
v2.1.98	169,514	49,726	49,726
v2.1.100	168,536 (-978 B)	69,922	69,922
v2.1.101	171,903 (+2,389 B)	~72,000	~72,000

v2.1.100 sends 978 fewer bytes than v2.1.98 but is billed 20,196 MORE tokens. This rules out any client-side payload difference — the inflation happens server-side after the request is received.

Cross-account test (same v2.1.98, two different Max accounts): delta < 500 tokens = noise. Not account-specific.

Interactive mode confirms

/tmp/claude-cache-*.json files across 40+ sessions show bimodal distribution:

Group A: ~50K (cache_read=0, cache_create=~50K) — matches v2.1.98 --print
Group B: ~71K (cache_read=0, cache_create=~71K) — matches v2.1.100+ --print

Some sessions start cold at 71K with cache_read=0, confirming this is not accumulated cache — it's the baseline.

Quality concern

The 20K extra tokens are cache_creation_input_tokens — this means they enter the model's context window, not just the billing ledger. If the server injects additional content invisible to the user:

Instruction dilution: Hidden system content may compete with user-provided CLAUDE.md rules, leading to inconsistent agent behavior
Reduced effective context: 20K fewer tokens available for actual conversation history — in long sessions this compounds with every turn
Unverifiable behavior: Users cannot audit what the model actually "sees" vs what they sent — makes debugging agent misbehavior significantly harder

We observed qualitative differences between v2.1.98 and v2.1.100+ sessions (instruction adherence, tool selection accuracy), but cannot isolate whether this is caused by the extra hidden tokens or other version changes. The lack of transparency makes it impossible to tell.

Additional findings

After /login account switch, statusline can jump ±20K — this is cache invalidation (new account_uuid = new cache key), not a billing difference between accounts.
56 count_tokens burst calls observed immediately after first interactive prompt — inflates the apparent "first visible" context number in statusline.
Current v2.1.104 is untested — the investigation was done on v2.1.98/100/101. We cannot confirm whether v2.1.104 carries the same inflation.

Impact

~20K extra tokens per session = ~40% overhead on a clean project. On Max plan with usage limits, this means hitting the 5-hour cap significantly faster on newer CC versions.

Combined with the quality concern: users are paying more for potentially degraded output, with no visibility into why.

Workaround

Downgrade to v2.1.98 or earlier. Check ~/.local/share/claude/versions/ for available binaries, or use npx claude-code@2.1.98.

Note: Auto-updates may overwrite older versions. v2.1.98 is no longer retained in the local versions directory after v2.1.104 installed.

Phantom ~22K tokens per session on one account vs another — same machine, same project #45515 — Original phantom token report (account-specific delta, now understood as cache invalidation artifact)
Reddit investigation with full proxy data: https://www.reddit.com/r/ClaudeCode/comments/1sj10ou/
Community reports of rapid limit exhaustion correlate with v2.1.100+ rollout timeline

Environment

Claude Code: v2.1.98 / v2.1.100 / v2.1.101 (tested all three)
OS: Linux (WSL2), Windows 11
Plan: Max (5x)
Install method: native
Measurement: HTTP proxy capturing full request/response bodies