GitHub - stephenlthorn/token-lens: Tokenlens is an open-source AI prompt and agent workflow analyzer that finds token waste, repeated context, and prompt caching opportunities to reduce LLM cost and latency.

AI cost intelligence + gateway — local-first.

TokenLens is a transparent proxy + dashboard that sits between your code and every AI provider. It records every API call, shows exactly where your money goes, detects waste, finds savings — and now acts as a full AI gateway: enforce quotas, route traffic, and block dangerous content before it reaches the model.

Zero config — install, point your SDK, done
100% local — your data never leaves your machine
Real-time — live WebSocket feed of every API call
Works with Anthropic, OpenAI, and Google AI
Gateway — kill switches, quotas, model aliasing, fallback chains, PII/injection guardrails

Quick Start

# 1. Install
pip install tokenlens        # or: pipx install tokenlens

# 2. Set up as background service (auto-starts on boot)
tokenlens install

# 3. Open dashboard
tokenlens ui

That's it. Your shell now has ANTHROPIC_BASE_URL, OPENAI_BASE_URL, and GOOGLE_AI_BASE_URL pointed at the TokenLens proxy. Every SDK call flows through TokenLens automatically.

Manual setup (without daemon)

tokenlens daemon --port 8420

export ANTHROPIC_BASE_URL="http://localhost:8420/proxy/anthropic"
export OPENAI_BASE_URL="http://localhost:8420/proxy/openai"
export GOOGLE_AI_BASE_URL="http://localhost:8420/proxy/google"

Architecture

How It Works

Your App → SDK → TokenLens Proxy (localhost:8420) → AI Provider
                      ↓
              1. Budget check       — block if global daily/monthly limit exceeded
              2. Quota check        — block if source limit exceeded or kill switch active
              3. Routing            — resolve alias, weighted balance, latency select
              4. Request guardrails — scan prompt for PII / injection → warn or block
              5. Waste detection    — whitespace bloat, filler, redundant instructions
              6. Token heatmap      — section breakdown: system/tools/context/history/query
              7. Dedup check        — return cached response if TTL hit
              8. Upstream call      — retry fallback chain on 5xx / 429
              9. Response guardrails — scan model output for PII → warn or block
             10. Record + cost      — write to local SQLite
             11. Broadcast          — WebSocket → dashboard, CLI, alerts, webhooks
                      ↓
              Dashboard · tokenlens top · Alerts · Webhooks

TokenLens acts as a transparent HTTP proxy. It enforces policy, records cost, then forwards requests to the real AI provider — modifying only the model field when routing rules apply. Responses are returned to the caller untouched. All data stays in a local SQLite database.

AI Gateway ✨ New

The gateway layer sits in the proxy pipeline and gives you control over what traffic reaches the model, how it's routed, and what content is allowed through.

Kill Switches & Quotas

Block a runaway agent instantly, or set spend/call caps per source or model.

# Via API
curl -X PUT http://localhost:8420/api/config/quotas \
  -H 'Content-Type: application/json' \
  -d '{
    "kill_switches": ["my-agent"],
    "source_limits": [
      {"source": "my-agent", "daily_usd": 5.00, "monthly_usd": 50.00}
    ],
    "model_limits": [
      {"model": "claude-opus-4-6", "daily_calls": 100}
    ]
  }'

Or configure visually in Settings → Per-Source Quotas.

Model Aliasing & Fallback Routing

Swap models transparently, and automatically retry on failures.

curl -X PUT http://localhost:8420/api/config/routing \
  -H 'Content-Type: application/json' \
  -d '{
    "aliases": [
      {"from": "gpt-4", "to": "claude-sonnet-4-6"}
    ],
    "fallback_chains": [
      {"trigger_model": "claude-opus-4-6",
       "fallbacks": ["claude-sonnet-4-6", "claude-haiku-4-5"]}
    ],
    "weights": [
      {"source": "my-agent", "rules": [
        {"model": "claude-haiku-4-5", "weight": 70},
        {"model": "claude-sonnet-4-6", "weight": 30}
      ]}
    ]
  }'

Or configure in Settings → Model Routing.

Content Guardrails

Scan prompts and responses for PII and prompt injection. Choose warn (log only) or block (return 400).

curl -X PUT http://localhost:8420/api/config/guardrails \
  -H 'Content-Type: application/json' \
  -d '{
    "pii_detection": {"enabled": true, "action": "block"},
    "injection_detection": {"enabled": true, "action": "warn"},
    "custom_rules": [
      {"name": "no-ssn", "pattern": "\\d{3}-\\d{2}-\\d{4}", "action": "block"}
    ]
  }'

Or configure in Settings → Content Guardrails.

Features

Cost Intelligence

Feature	Description
Real-time KPIs	Total spend, savings, call count, and token breakdown at a glance
Spend Forecasting	Projected monthly cost with trend analysis and confidence scoring
Token Cost Breakdown	Daily cost split by input, output, cache read, and cache write tokens
Cost Allocation Tags	Per-source cost aggregation — see which tool or agent spends the most
Model Comparison	"What if I switched from Opus to Sonnet?" — instant cost comparison
Budget Caps	Global daily and monthly spend limits with automatic request blocking
Custom Pricing	Override default per-token rates for any model
Cost Anomaly Detection	Rolling mean + 2σ detection of spend, call count, and token spikes
Cost Alerts	Real-time WebSocket alerts when daily spend exceeds threshold
Weekly Digest	Automated Sunday report: spend, top sources, waste, budget projection

Token Intelligence

Feature	Description
Token Waste Detection	Identifies junk tokens: whitespace bloat, polite filler, redundant instructions, empty messages
Output Utilization	Tracks how much of your `max_tokens` budget is actually used per call
Token Heatmap	Breaks down every request into sections: system prompt, tools, context, history, query
History Bloat Tracking	Detects sources where conversation history consumes >60% of input tokens
Model Right-Sizing	Scores call complexity (0–9) and recommends cheaper models for simple tasks

AI Gateway ✨ New

Feature	Description
Kill Switches	Instantly block all traffic from a specific source (tool/agent)
Per-Source Quotas	Daily and monthly spend caps per source, with automatic request blocking
Per-Model Call Limits	Daily call count caps per model to prevent runaway usage
Model Aliasing	Transparently swap one model for another — callers use the alias, proxy uses the real model
Fallback Chains	On 5xx/429, retry with a priority-ordered list of fallback models
Weighted Load Balancing	Split traffic for a source across models by percentage weight
Latency-Based Routing	Route to the lowest-latency provider based on recent P99 measurements
PII Detection	Detect email, phone, SSN, and credit card numbers in prompts and responses
Prompt Injection Detection	Flag attempts to override system prompts or role-hijack the model
Custom Guardrail Rules	Add your own regex patterns with warn or block actions
Request & Response Guardrails	Guardrails apply to both the incoming prompt and the model's response

Developer Playground ✨ New

Feature	Description
Model Selector	Choose provider and model
Temperature Control	Slider (0–2)
Cost Preview	Estimated cost before running
Live Execution	Send prompts through the proxy — tracked in the dashboard
Source Tagging	All playground requests tagged as `playground` for cost attribution

Observability

Feature	Description
Live Feed	Real-time WebSocket stream of every API call with cost
`tokenlens top`	htop-style live terminal view: calls/min, cost/hr, cache %, waste
Cache Hit Tracking	Daily cache hit rate trend with improvement/degradation detection
Provider Health	P50/P95/P99 latency and error rates per provider
Rate Limit Tracking	Hourly 429 error counts and timeline per provider
Request Logging	Optional full request/response body capture for debugging
Session Detection	Groups API calls into logical sessions by timing and source
Prometheus Metrics	`GET /metrics` endpoint for Grafana/Prometheus integration
Request Deduplication	Cache identical requests within a TTL window (opt-in)

Recommendations

Feature	Description
12-Check Engine	Automated analysis: cache utilization, model costs, prompt sizes, waste, history bloat, model right-sizing, and more
Actionable Insights	Each recommendation includes estimated savings impact

Integrations

Feature	Description
CSV Export	Download usage data for spreadsheets or BI tools
Webhook Notifications	HTTP callbacks on `call_recorded`, `cost_alert`, and `weekly_digest` events
Prometheus /metrics	Standard exposition format for monitoring stacks
`tokenlens report`	CLI command to print or export the weekly cost digest

CLI Reference

tokenlens install             # Install as background service (auto-starts on boot)
tokenlens uninstall [--purge] # Remove service (--purge deletes all data)
tokenlens ui [--port 8420]    # Open dashboard in browser
tokenlens daemon              # Run server in foreground (for debugging)
tokenlens status              # Show daemon status
tokenlens top [--port 8420]   # Live htop-style terminal view of API traffic
tokenlens report [--days 7]   # Print weekly cost digest
tokenlens analyze <file|->    # Analyze a prompt trace for waste

`tokenlens top` keys

q / Q    Quit
p / P    Pause / resume live feed

`tokenlens report` options

--days N          Days to include (default: 7)
--format human    Human-readable text (default)
--format json     JSON output for scripting
--port N          Daemon port (default: 8420)

`tokenlens analyze` options

--format human|json     Output format (default: human)
--suggestions           Show full suggestion details
--score-only            Print only the cacheability score
--min-tokens N          Minimum tokens to flag (default: 50)

API Reference

All endpoints are served at http://localhost:8420 (default port).

Usage

Method	Endpoint	Description
`GET`	`/api/usage/kpi?days=30`	KPI summary (cost, savings, tokens)
`GET`	`/api/usage/daily?days=30`	Daily aggregated usage
`GET`	`/api/usage/recent?limit=50`	Recent raw API calls
`GET`	`/api/usage/sources`	Cost breakdown by source
`GET`	`/api/usage/forecast`	Projected monthly spend
`GET`	`/api/usage/by-tag?days=30`	Cost allocation by tag/source
`GET`	`/api/usage/token-breakdown?days=30`	Cost split by token type
`GET`	`/api/usage/cache-trend?days=30`	Cache hit rate over time
`GET`	`/api/usage/compare?from_model=X&to_model=Y`	Model cost comparison
`GET`	`/api/usage/sessions?days=1`	Detected sessions
`GET`	`/api/usage/budget-status`	Current spend vs limits
`GET`	`/api/usage/provider-health?days=1`	Latency and error rates
`GET`	`/api/usage/rate-limits?days=1`	Rate limit (429) events
`GET`	`/api/usage/recommendations`	Cost recommendations
`GET`	`/api/usage/waste-summary?days=30`	Token waste breakdown by type
`GET`	`/api/usage/conversation-efficiency?days=30`	History bloat per source
`GET`	`/api/usage/token-heatmap?days=30`	Token section breakdown per call
`GET`	`/api/usage/anomalies?days=30`	Spend/call/token anomalies
`GET`	`/api/usage/right-sizing?days=30`	Model right-sizing recommendations
`GET`	`/api/usage/digest?days=7`	Weekly cost digest

Settings

Method	Endpoint	Description
`GET/PUT`	`/api/settings/alerts`	Cost alert thresholds
`GET/PUT`	`/api/settings/budget`	Budget cap configuration
`GET/PUT`	`/api/settings/pricing`	Custom pricing overrides
`GET/PUT`	`/api/settings/webhooks`	Webhook notification config

Gateway

Method	Endpoint	Description
`GET/PUT`	`/api/config/quotas`	Per-source quotas, per-model call limits, kill switches
`GET/PUT`	`/api/config/routing`	Model aliases, fallback chains, weighted balancing, latency routing
`GET/PUT`	`/api/config/guardrails`	PII detection, injection detection, custom regex rules
`POST`	`/api/playground/run`	Execute a prompt through the proxy from the Developer Playground

Other

Method	Endpoint	Description
`GET`	`/api/status`	Daemon health and DB stats
`GET`	`/api/export/csv?days=30`	Download usage as CSV
`GET`	`/api/logs?limit=20`	Request/response logs
`GET`	`/api/logs/{id}`	Single log entry detail
`GET`	`/metrics`	Prometheus metrics
`POST`	`/api/analyze`	Analyze prompt for cacheability
`WS`	`/api/live`	Real-time call stream
`ANY`	`/proxy/{provider}/{path}`	Transparent API proxy

Configuration

TokenLens stores its data in ~/.tokenlens/:

~/.tokenlens/
  usage.db        # SQLite database (all usage data)
  config.toml     # Retention settings
  logs/           # Daemon logs

Settings (via API or Dashboard → Settings tab)

Setting	Default	Description
`budget.daily_limit_usd`	—	Global daily spend cap
`budget.monthly_limit_usd`	—	Global monthly spend cap
`budget.enabled`	`false`	Enable budget enforcement
`alerts.daily_cost_threshold`	—	Alert when daily spend exceeds this
`alerts.enabled`	`false`	Enable cost alerts
`webhook.url`	—	Webhook endpoint URL
`webhook.events`	—	Comma-separated: `call_recorded`, `cost_alert`, `weekly_digest`
`webhook.enabled`	`false`	Enable webhooks
`logging.enabled`	`false`	Capture request/response bodies
`dedup.enabled`	`false`	Enable request deduplication
`dedup.ttl_seconds`	`60`	Dedup cache TTL
`quotas.config`	—	Per-source quotas, per-model limits, kill switches (JSON — see Gateway section)
`routing.config`	—	Model aliases, fallback chains, weights, latency routing (JSON — see Gateway section)
`guardrails.config`	—	PII detection, injection detection, custom rules (JSON — see Gateway section)

Development

git clone https://github.com/stephenlthorn/token-lens.git
cd token-lens
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
python3 -m pytest tests/ -v

Requirements

Python 3.11+
No external services required

Sponsorship

If TokenLens helps you ship faster or cut token spend, consider sponsoring:

https://github.com/sponsors/stephenlthorn

Tier	Price
Supporter	$5/month
Power User	$25/month
Company Sponsor	$200/month

License

See LICENSE for details.

Quick Start

Manual setup (without daemon)

Architecture

How It Works

AI Gateway ✨ New

Kill Switches & Quotas

Model Aliasing & Fallback Routing

Content Guardrails

Features

Cost Intelligence

Token Intelligence

AI Gateway ✨ New

Developer Playground ✨ New

Observability

Recommendations

Integrations

CLI Reference

tokenlens top keys

tokenlens report options

tokenlens analyze options

API Reference

Usage

Settings

Gateway

Other

Configuration

Settings (via API or Dashboard → Settings tab)

Development

Requirements

Sponsorship

License

`tokenlens top` keys

`tokenlens report` options

`tokenlens analyze` options