llama-dash — self-hosted inference gateway

1 min read Original article ↗

gateway online|running 3 · peer 2|req/s 0.87

LOCAL INFERENCE CONTROL PLANE

One control plane for local inference.

Monitor models, requests, API keys, routing rules, and proxy metrics from one dashboard for llama-swap and compatible upstreams.

WORKS WITHOpenAI SDK·Claude Code·Continue·Open WebUI

OPERATOR DASHBOARD2026-04-30 · 22:01

REQUEST PIPELINE

CLIENTS

OpenAI SDK
Claude Code
Continue · Open WebUI

──▶

llama-dash :3000

dashboard · auth · logs
routing · metrics

──▶

llama-swap :8080

llama.cpp · peers

direct /v1 upstreams

OpenAI · Anthropic

WHAT IT DOES

D01

Watch the box

Live request, token, model, upstream, and GPU status in one dashboard.

M05

Manage models

Load, unload, inspect per-model stats, and edit llama-swap config with validation.

R02

Track requests

Searchable history with filters, histograms, token counts, and cost estimates.

K08

Control access

Hashed API keys, per-key RPM/TPM limits, and model allow-lists.

P10

Enforce policy

Routing rules for model rewrites, passthrough auth, and encrypted credentials.

P06

Test models

Playgrounds for chat, image, speech, and article-to-speech transcription.