IonRouter

2 min read Original article ↗

High throughput, low cost inference. Powered by IonAttention.

NVIDIA Grace Hopper Superchip

IonAttention Engine

Not just fast hardware.
A faster engine: IonAttention.

Our custom inference stack multiplexes models on a single GPU, swaps in ms, and adapts to traffic in real time. Built from the ground up for Grace Hopper.

Throughput (tok/s)Single GH200, Qwen2.5-7B

Top inference provider~3,000

Read the deep dive

Custom Models

Bring any model.
Get dedicated streams.

Deploy your finetunes, custom LoRAs, or any open-source model on our fleet. Dedicated GPU streams with no cold starts and per-second billing.

Book a call

What Teams Build on Ion

From robots to
real-time video.

Teams use Ion for highest performance robotics perception, multi-camera surveillance, game asset generation, and AI video pipelines.

Case Study

5 VLMs, 1 GPU.

Five vision-language models on a single GPU — 2,700 video clips, concurrent users, <1s cold starts.

Read the case study

API · Zero Code Changes

Drop in.
Ship faster.

Point your existing OpenAI client at Ion. Any language, any framework. One line change.

Models & Pricing

Pay per million tokens. No idle costs.

ZhiPu AI's flagship 600B+ MoE model with state-of-the-art reasoning, coding, and multilingual capabilities, powered by EAGLE speculative decoding on 8x B200 GPUs.

~220 tok/s$1.20 in · $3.50 out

Try in Playground

MoonShot AI's frontier reasoning model designed for long document understanding, multi-step reasoning chains, and complex problem decomposition across technical and scientific domains.

~120 tok/s$0.20 in · $1.60 out

Try in Playground

MiniMax's flagship 1M-context language model delivering strong reasoning and instruction following across long documents, multi-turn dialogue, and complex analysis.

~120 tok/s$0.40 in · $1.50 out

Try in Playground

Qwen3.5-122B-A10BLanguage

Cumulus's most capable open-source model — a 122B MoE with 10B active parameters rivaling leading proprietary models on coding, reasoning, and multilingual benchmarks.

~120 tok/s$0.20 in · $1.60 out

Try in Playground

A frontier open-source 120B model delivering cutting-edge reasoning and instruction following comparable to leading closed-source systems, ideal for complex agentic workflows and advanced code generation.

~100 tok/s$0.020 in · $0.095 out

Try in Playground

Wan2.2 Text-to-VideoVideo

A 14B text-to-video model optimized for speed via the FastGen runtime, generating clips in under 10 seconds with strong motion coherence.

~8s/clip$0.00194 / GPU·sec

Try in Playground

Black Forest Labs' fastest Flux model, delivering crisp sub-4-second image generation ideal for real-time applications, prototyping, and high-volume pipelines.

~3s/image~$0.005 in · per image out

Try in Playground

Ready to build?

Start in under a minute. No GPU expertise required.