High throughput, low cost inference. Powered by IonAttention.

IonAttention Engine
Not just fast hardware.
A faster engine: IonAttention.
Our custom inference stack multiplexes models on a single GPU, swaps in ms, and adapts to traffic in real time. Built from the ground up for Grace Hopper.
Throughput (tok/s)Single GH200, Qwen2.5-7B
Top inference provider~3,000
Custom Models
Bring any model.
Get dedicated streams.
Deploy your finetunes, custom LoRAs, or any open-source model on our fleet. Dedicated GPU streams with no cold starts and per-second billing.
What Teams Build on Ion
From robots to
real-time video.
Teams use Ion for highest performance robotics perception, multi-camera surveillance, game asset generation, and AI video pipelines.
Case Study
5 VLMs, 1 GPU.
Five vision-language models on a single GPU — 2,700 video clips, concurrent users, <1s cold starts.
Read the case studyAPI · Zero Code Changes
Drop in.
Ship faster.
Point your existing OpenAI client at Ion. Any language, any framework. One line change.
Models & Pricing
Pay per million tokens. No idle costs.
ZhiPu AI's flagship 600B+ MoE model with state-of-the-art reasoning, coding, and multilingual capabilities, powered by EAGLE speculative decoding on 8x B200 GPUs.
~220 tok/s$1.20 in · $3.50 out
Try in Playground
MoonShot AI's frontier reasoning model designed for long document understanding, multi-step reasoning chains, and complex problem decomposition across technical and scientific domains.
~120 tok/s$0.20 in · $1.60 out
Try in Playground
MiniMax's flagship 1M-context language model delivering strong reasoning and instruction following across long documents, multi-turn dialogue, and complex analysis.
~120 tok/s$0.40 in · $1.50 out
Try in Playground
Qwen3.5-122B-A10BLanguage
Cumulus's most capable open-source model — a 122B MoE with 10B active parameters rivaling leading proprietary models on coding, reasoning, and multilingual benchmarks.
~120 tok/s$0.20 in · $1.60 out
Try in Playground
A frontier open-source 120B model delivering cutting-edge reasoning and instruction following comparable to leading closed-source systems, ideal for complex agentic workflows and advanced code generation.
~100 tok/s$0.020 in · $0.095 out
Try in Playground
Wan2.2 Text-to-VideoVideo
A 14B text-to-video model optimized for speed via the FastGen runtime, generating clips in under 10 seconds with strong motion coherence.
~8s/clip$0.00194 / GPU·sec
Try in Playground
Black Forest Labs' fastest Flux model, delivering crisp sub-4-second image generation ideal for real-time applications, prototyping, and high-volume pipelines.
~3s/image~$0.005 in · per image out
Try in Playground
Ready to build?
Start in under a minute. No GPU expertise required.