$ curl https://api.lxg2it.com/v1/chat/completions \
-H "Authorization: Bearer $KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "standard",
"prefer": "cheap",
"messages": [{"role": "user", "content": "Hello"}]
}'
You pick what matters — the capability tier and the optimisation direction. The router picks the model. When a cheaper option launches or a provider goes down, your requests adapt automatically. No model names to track. No code to change.
model
The capability tier — sets the floor for what models are eligible. auto analyses your conversation context (system prompt, code blocks, message history) to pick the right tier automatically.
economy·standard·premium·auto
prefer
The optimisation direction within that tier.
cheap·fast·balanced·quality·coding
You can also pass a specific model name
(gpt-4.1,
claude-sonnet-4-6)
to pin routing and bypass tier selection entirely.
Current tiers
economy gemini-2.5-flash · gpt-4.1-mini · o4-mini · claude-haiku-4-5 · grok-3-mini-beta · nvidia.nemotron-nano-3-30b · nvidia.nemotron-nano-9b-v2 · zai.glm-4.7-flash · qwen.qwen3-32b-v1:0 · openai.gpt-oss-120b-1:0 · llama-3.3-70b-versatile · meta-llama/llama-4-scout-17b-16e-instruct · llama3.1-8b · qwen-3-235b-a22b-instruct-2507
standard gemini-2.5-pro · gpt-4.1 · gpt-5.3-chat-latest · gpt-5.3-codex · gpt-5.1-codex-mini · o3 · claude-sonnet-4-6 · grok-3-beta · zai.glm-4.7 · deepseek.v3.2 · mistral.mistral-large-3-675b-instruct · moonshotai.kimi-k2.5 · minimax.minimax-m2.1 · qwen.qwen3-next-80b-a3b · us.meta.llama4-maverick-17b-instruct-v1:0 · us.meta.llama4-scout-17b-instruct-v1:0 · mistral.devstral-2-123b · qwen.qwen3-coder-480b-a35b-v1:0 · nvidia.nemotron-super-3-120b · qwen.qwen3-235b-a22b-2507-v1:0
premium gemini-3.1-pro-preview · claude-opus-4-7 · claude-opus-4-6 · gpt-5.4 · zai.glm-5
Context-window guard: never routes to a model that can't handle your input. Circuit breakers reroute around provider outages automatically.
Free models, no credit card
Fast models via Groq and Cerebras are routed at no cost — no credits, no card required. Sign up and start making requests immediately. Add credits when you need the full range of premium models.
Transparent pricing
A 4% fee on credit deposits. Requests are billed at actual provider market rates —
you pay what the model costs, nothing more.
Every response includes X-Model-Router-Model and X-Model-Router-Provider
headers so you always know exactly what ran and what it cost.
Auto-routing
Set model: "auto" and the router analyses your full conversation context —
system prompt, code blocks, message history, tool use, reasoning markers — and picks the right tier automatically.
No heuristics on individual messages; it reads the whole picture.
Every auto-routed response includes X-Model-Router-Auto-Tier and X-Model-Router-Auto-Score
headers so you can see exactly why a tier was chosen.
How it works →
Automatic failover
Circuit breakers detect provider outages and reroute requests in real time. Context-window guards ensure requests never go to a model that can't handle them. Your code doesn't change — routing adapts automatically.
You stay in control
Block providers you don't want to fund. Set daily spend limits. Enable auto-recharge so you never hit a wall mid-project. Export request traces to any OTLP backend — Axiom, Grafana, Honeycomb, Datadog.
Embeddings included
Same key, same endpoint pattern. embed-small, embed-large,
and embed-titan aliases route to the best available embedding model.
Batch inputs, optional dimension truncation, billed at input tokens only.
Get started
2
Point any OpenAI-compatible client at https://api.lxg2it.com
3
That's it. Free models work immediately. Add credits for the full range.