gateway online|running 3 · peer 2|req/s 0.87
LOCAL INFERENCE CONTROL PLANE
One control plane for local inference.
Monitor models, requests, API keys, routing rules, and proxy metrics from one dashboard for llama-swap and compatible upstreams.
WORKS WITHOpenAI SDK·Claude Code·Continue·Open WebUI
OPERATOR DASHBOARD2026-04-30 · 22:01
REQUEST PIPELINE
CLIENTS
OpenAI SDK
Claude Code
Continue · Open WebUI
──▶
llama-dash :3000
dashboard · auth · logs
routing · metrics
──▶
llama-swap :8080
llama.cpp · peers
direct /v1 upstreams
OpenAI · Anthropic
WHAT IT DOES
D01
Watch the box
Live request, token, model, upstream, and GPU status in one dashboard.
M05
Manage models
Load, unload, inspect per-model stats, and edit llama-swap config with validation.
R02
Track requests
Searchable history with filters, histograms, token counts, and cost estimates.
K08
Control access
Hashed API keys, per-key RPM/TPM limits, and model allow-lists.
P10
Enforce policy
Routing rules for model rewrites, passthrough auth, and encrypted credentials.
P06
Test models
Playgrounds for chat, image, speech, and article-to-speech transcription.