Model Router — AI API Gateway

$ curl https://api.lxg2it.com/v1/chat/completions \
    -H "Authorization: Bearer $KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "standard",
      "prefer": "cheap",
      "messages": [{"role": "user", "content": "Hello"}]
    }'

You pick what matters — the capability tier and the optimisation direction. The router picks the model. When a cheaper option launches or a provider goes down, your requests adapt automatically. No model names to track. No code to change.

model

The capability tier — sets the floor for what models are eligible. auto analyses your conversation context (system prompt, code blocks, message history) to pick the right tier automatically.

economy·standard·premium·auto

prefer

The optimisation direction within that tier.

cheap·fast·balanced·quality·coding

You can also pass a specific model name (gpt-4.1, claude-sonnet-4-6) to pin routing and bypass tier selection entirely.

Current tiers

economy gemini-2.5-flash · gpt-4.1-mini · o4-mini · claude-haiku-4-5 · grok-3-mini-beta · nvidia.nemotron-nano-3-30b · nvidia.nemotron-nano-9b-v2 · zai.glm-4.7-flash · qwen.qwen3-32b-v1:0 · openai.gpt-oss-120b-1:0 · llama-3.3-70b-versatile · meta-llama/llama-4-scout-17b-16e-instruct · llama3.1-8b · qwen-3-235b-a22b-instruct-2507

standard gemini-2.5-pro · gpt-4.1 · gpt-5.3-chat-latest · gpt-5.3-codex · gpt-5.1-codex-mini · o3 · claude-sonnet-4-6 · grok-3-beta · zai.glm-4.7 · deepseek.v3.2 · mistral.mistral-large-3-675b-instruct · moonshotai.kimi-k2.5 · minimax.minimax-m2.1 · qwen.qwen3-next-80b-a3b · us.meta.llama4-maverick-17b-instruct-v1:0 · us.meta.llama4-scout-17b-instruct-v1:0 · mistral.devstral-2-123b · qwen.qwen3-coder-480b-a35b-v1:0 · nvidia.nemotron-super-3-120b · qwen.qwen3-235b-a22b-2507-v1:0

premium gemini-3.1-pro-preview · claude-opus-4-7 · claude-opus-4-6 · gpt-5.4 · zai.glm-5

Context-window guard: never routes to a model that can't handle your input. Circuit breakers reroute around provider outages automatically.

Available models →

Free models, no credit card

Fast models via Groq and Cerebras are routed at no cost — no credits, no card required. Sign up and start making requests immediately. Add credits when you need the full range of premium models.

Transparent pricing

A 4% fee on credit deposits. Requests are billed at actual provider market rates — you pay what the model costs, nothing more. Every response includes X-Model-Router-Model and X-Model-Router-Provider headers so you always know exactly what ran and what it cost.

Auto-routing

Set model: "auto" and the router analyses your full conversation context — system prompt, code blocks, message history, tool use, reasoning markers — and picks the right tier automatically. No heuristics on individual messages; it reads the whole picture. Every auto-routed response includes X-Model-Router-Auto-Tier and X-Model-Router-Auto-Score headers so you can see exactly why a tier was chosen. How it works →

Automatic failover

Circuit breakers detect provider outages and reroute requests in real time. Context-window guards ensure requests never go to a model that can't handle them. Your code doesn't change — routing adapts automatically.

You stay in control

Block providers you don't want to fund. Set daily spend limits. Enable auto-recharge so you never hit a wall mid-project. Export request traces to any OTLP backend — Axiom, Grafana, Honeycomb, Datadog.

Embeddings included

Same key, same endpoint pattern. embed-small, embed-large, and embed-titan aliases route to the best available embedding model. Batch inputs, optional dimension truncation, billed at input tokens only.

Get started

Point any OpenAI-compatible client at https://api.lxg2it.com

That's it. Free models work immediately. Add credits for the full range.

Full documentation →