34 days after a $619M IPO, MiniMax claims SOTA coding performance at a fraction of competitor pricing
If you want these landing in your inbox regularly, subscribe to my newsletter.
The news
MiniMax released M2.5 on February 12, 2026, and the official announcement makes a claim that would have seemed absurd six months ago: a Chinese model matching Claude Opus 4.6 on coding benchmarks at roughly one-thirtieth the output price.
The numbers are stark. SWE-Bench Verified: 80.2%. That puts M2.5 in a statistical dead heat with Anthropic’s flagship coding model. Multi-SWE-Bench: 51.3%. BrowseComp: 76.3% with context management. These are not mid-tier results.

Two variants launched simultaneously. The standard M2.5 costs $0.15 per million input tokens and $1.20 per million output tokens, generating at 50 tokens per second. M2.5-Lightning doubles those prices and doubles the speed, hitting 100 tokens per second at $0.30/M input and $2.40/M output.
The pricing earthquake
The price gap extends well beyond a rounding error.
| Model | Input ($/M) | Output ($/M) | Output ratio vs M2.5 |
|---|---|---|---|
| MiniMax M2.5 | $0.15 | $1.20 | 1x |
| MiniMax M2.5-Lightning | $0.30 | $2.40 | 2x |
| DeepSeek-R1 | $0.55 | $2.19 | 1.8x |
| GPT-5.1 | $1.25 | $10.00 | 8.3x |
| Gemini 3 Pro | $2.00 | $12.00 | 10x |
| Claude Opus 4.5 | $5.00 | $25.00 | 20.8x |
MiniMax puts it bluntly in their technical documentation: the cost of M2.5 is one-tenth to one-twentieth that of Opus, Gemini 3 Pro, and GPT-5 based on output price. The math checks out.
For a developer running heavy inference workloads, this is the difference between a $500 monthly bill and a $10,000 one. At scale, infrastructure costs that once required budget approval become line items small enough to expense without a second thought. Claude Opus is twenty times more expensive on output.

The Burnwise pricing tracker confirms these figures align with current market rates for frontier models. MiniMax is not discounting into oblivion; they are pricing at a structural level that Western competitors have not touched.
Benchmark claims and verification caveats
80.2% on SWE-Bench Verified places M2.5 at the frontier. The HuggingFace model card provides methodology details, but independent verification remains thin.
Standard disclaimers apply. Benchmark gaming is an industry sport. SWE-Bench has seen contested results where aggressive test-time compute and prompt engineering inflate scores that don’t translate to real-world performance. MiniMax has opened model weights for scrutiny, which is more than some competitors offer, but the broader research community has not yet stress-tested these claims.
What is verifiable: the model is available through OpenRouter for immediate use. Developers can run their own evaluations. The pricing is real. The benchmarks are claims that may or may not survive third-party replication.
“Based on output price, the cost of M2.5 is one-tenth to one-twentieth that of Opus, Gemini 3 Pro, and GPT-5.” — MiniMax technical documentation
The Forge framework
MiniMax credits a custom reinforcement learning infrastructure called Forge for the performance gains. Their post-training writeup describes a 40x training speedup over baseline approaches, enabled by 200,000+ training environments.
The technical claim addresses the compute bottleneck. If Forge delivers 40x efficiency gains, the economics of frontier model training shift. A training run that would cost $100 million at standard efficiency might cost $2.5 million with Forge-style optimisation. That’s the difference between a project requiring venture capital and one that fits in a Series A budget.
Forge focuses on agentic capabilities, training models to use tools, browse the web, and manage long-horizon tasks. The BrowseComp score of 76.3% reflects this focus. MiniMax is building agent infrastructure where the model serves as the reasoning core.

The platform claims 10,000+ user-built “Experts” on their MiniMax Agent system. These are custom workflows and tools built by developers using the underlying model. The number suggests genuine adoption, not just API keys claimed but unused too.
The Chinese AI wave context
MiniMax didn’t emerge from nowhere. The company went public on January 9, 2026, raising $619 million in a Hong Kong IPO that valued the company at over $11.5 billion on debut. The stock jumped 109% on its first trading day. Forbes reported that founder Yan Junjie became a billionaire in the process.
Thirty-four days passed between that IPO and the M2.5 release. That timing isn’t coincidental. Public markets demand growth narratives, and delivering a frontier model at disruptive pricing within weeks of listing sends a clear signal to investors about R&D velocity.
MiniMax isn’t alone either. Xinhua reports that Zhipu released GLM-5 on February 11, one day before MiniMax’s announcement. The Chinese AI sector is coordinating releases in ways that suggest both state support and intense domestic competition. Western labs now face simultaneous pressure from multiple Chinese competitors rather than a single DeepSeek-style surprise.
Hacker News discussion captured the shift. One commenter called MiniMax their “fast workhorse for tool calling.” Another noted that Chinese models are delivering “high quality drops for the perfect trifecta of leading models.” Developer sentiment has moved from skepticism to qualified acceptance.

What to watch next
Three questions will determine whether M2.5 reshapes the market or becomes another discounted model fighting for scraps.
First, does independent benchmarking confirm the 80.2% SWE-Bench score? If third-party evaluations show regression to 70% or below, the pricing advantage matters less. Developers will pay premium rates for reliability; they won’t pay any rate for uncertainty.
Second, how do Western labs respond? Anthropic, OpenAI, and Google have not faced genuine price competition from Chinese models with claimed frontier performance. A price war would compress margins across the industry. The alternative is segmentation, where Western labs retreat to enterprise contracts and regulatory moats while Chinese models dominate price-sensitive markets.
Third, what does MiniMax ship next? A single model release proves capability. A sustained cadence of improvements proves the underlying infrastructure works. The 34-day gap between IPO and M2.5 suggests velocity, but velocity needs to continue.
The pricing is real. The benchmarks are claims. The market will sort the difference.
References
- MiniMax M2.5 Official Announcement – Primary source for benchmark claims and pricing
- MiniMax Forge Technical Documentation – Details on the RL framework and training methodology
- Reuters IPO Coverage – Financial context for MiniMax’s public listing
- Burnwise Frontier Model Pricing – Independent pricing comparison across frontier models
- Xinhua Chinese AI Sector Report – Broader context on Chinese AI releases including Zhipu GLM-5
I write about AI model releases and their market implications regularly. If this kind of breaking coverage is useful to you, consider subscribing so you don’t miss the next one.