MiniMax M2.5: Game-Changer with 80% Coding Benchmark Score

6 min read Original article ↗

34 days after a $619M IPO, MiniMax claims SOTA coding performance at a fraction of competitor pricing

If you want these landing in your inbox regularly, subscribe to my newsletter.


The news

MiniMax released M2.5 on February 12, 2026, and the official announcement makes a claim that would have seemed absurd six months ago: a Chinese model matching Claude Opus 4.6 on coding benchmarks at roughly one-thirtieth the output price.

The numbers are stark. SWE-Bench Verified: 80.2%. That puts M2.5 in a statistical dead heat with Anthropic’s flagship coding model. Multi-SWE-Bench: 51.3%. BrowseComp: 76.3% with context management. These are not mid-tier results.

Bar graph comparing benchmark scores of MiniMax M2.5, Claude Opus 4.6, GPT-5.1, and Gemini 3 Pro across SWE-Bench, Multi-SWE, and BrowseComp categories.
MINIMAX LEADS BENCHMARKS – Matching Claude, Leading Others

Two variants launched simultaneously. The standard M2.5 costs $0.15 per million input tokens and $1.20 per million output tokens, generating at 50 tokens per second. M2.5-Lightning doubles those prices and doubles the speed, hitting 100 tokens per second at $0.30/M input and $2.40/M output.


The pricing earthquake

The price gap extends well beyond a rounding error.

ModelInput ($/M)Output ($/M)Output ratio vs M2.5
MiniMax M2.5$0.15$1.201x
MiniMax M2.5-Lightning$0.30$2.402x
DeepSeek-R1$0.55$2.191.8x
GPT-5.1$1.25$10.008.3x
Gemini 3 Pro$2.00$12.0010x
Claude Opus 4.5$5.00$25.0020.8x

MiniMax puts it bluntly in their technical documentation: the cost of M2.5 is one-tenth to one-twentieth that of Opus, Gemini 3 Pro, and GPT-5 based on output price. The math checks out.

For a developer running heavy inference workloads, this is the difference between a $500 monthly bill and a $10,000 one. At scale, infrastructure costs that once required budget approval become line items small enough to expense without a second thought. Claude Opus is twenty times more expensive on output.

Bar chart comparing AI cost efficiency in output pricing per million tokens, displaying five models: MiniMax M2.5 ($1.20), DeepSeek-R1 ($2.19), GPT-5.1 ($10.00), Gemini 3 Pro ($12.00), and Claude Opus 4.5 ($25.00), highlighting Claude Opus as the most expensive option.
Al Cost Efficiency – Output Pricing Per Million Tokens

The Burnwise pricing tracker confirms these figures align with current market rates for frontier models. MiniMax is not discounting into oblivion; they are pricing at a structural level that Western competitors have not touched.


Benchmark claims and verification caveats

80.2% on SWE-Bench Verified places M2.5 at the frontier. The HuggingFace model card provides methodology details, but independent verification remains thin.

Standard disclaimers apply. Benchmark gaming is an industry sport. SWE-Bench has seen contested results where aggressive test-time compute and prompt engineering inflate scores that don’t translate to real-world performance. MiniMax has opened model weights for scrutiny, which is more than some competitors offer, but the broader research community has not yet stress-tested these claims.

What is verifiable: the model is available through OpenRouter for immediate use. Developers can run their own evaluations. The pricing is real. The benchmarks are claims that may or may not survive third-party replication.

“Based on output price, the cost of M2.5 is one-tenth to one-twentieth that of Opus, Gemini 3 Pro, and GPT-5.” — MiniMax technical documentation


The Forge framework

MiniMax credits a custom reinforcement learning infrastructure called Forge for the performance gains. Their post-training writeup describes a 40x training speedup over baseline approaches, enabled by 200,000+ training environments.

The technical claim addresses the compute bottleneck. If Forge delivers 40x efficiency gains, the economics of frontier model training shift. A training run that would cost $100 million at standard efficiency might cost $2.5 million with Forge-style optimisation. That’s the difference between a project requiring venture capital and one that fits in a Series A budget.

Forge focuses on agentic capabilities, training models to use tools, browse the web, and manage long-horizon tasks. The BrowseComp score of 76.3% reflects this focus. MiniMax is building agent infrastructure where the model serves as the reasoning core.

Diagram illustrating the Forge RL Framework for accelerating autonomous agent training, highlighting components such as 200K+ training environments, reward signals, model optimization, the M2.5 model, and agent capabilities including tool use, web browsing, and long-horizon tasks.
FORGE RL FRAMEWORK – ACCELERATING AUTONOMOUS AGENT TRAINING

The platform claims 10,000+ user-built “Experts” on their MiniMax Agent system. These are custom workflows and tools built by developers using the underlying model. The number suggests genuine adoption, not just API keys claimed but unused too.


The Chinese AI wave context

MiniMax didn’t emerge from nowhere. The company went public on January 9, 2026, raising $619 million in a Hong Kong IPO that valued the company at over $11.5 billion on debut. The stock jumped 109% on its first trading day. Forbes reported that founder Yan Junjie became a billionaire in the process.

Thirty-four days passed between that IPO and the M2.5 release. That timing isn’t coincidental. Public markets demand growth narratives, and delivering a frontier model at disruptive pricing within weeks of listing sends a clear signal to investors about R&D velocity.

MiniMax isn’t alone either. Xinhua reports that Zhipu released GLM-5 on February 11, one day before MiniMax’s announcement. The Chinese AI sector is coordinating releases in ways that suggest both state support and intense domestic competition. Western labs now face simultaneous pressure from multiple Chinese competitors rather than a single DeepSeek-style surprise.

Hacker News discussion captured the shift. One commenter called MiniMax their “fast workhorse for tool calling.” Another noted that Chinese models are delivering “high quality drops for the perfect trifecta of leading models.” Developer sentiment has moved from skepticism to qualified acceptance.

A timeline chart illustrating major events in AI innovation, including market disruptors like DeepSeek-R1 in Q4 2025, MiniMax's IPO in January 2026, Zhipu's release of GLM-5 in February 2026, and MiniMax's M2.5 release shortly after its IPO.
AI INNOVATION WAVE – MARKET DISRUPTORS & ACCELERATED RELEASES

What to watch next

Three questions will determine whether M2.5 reshapes the market or becomes another discounted model fighting for scraps.

First, does independent benchmarking confirm the 80.2% SWE-Bench score? If third-party evaluations show regression to 70% or below, the pricing advantage matters less. Developers will pay premium rates for reliability; they won’t pay any rate for uncertainty.

Second, how do Western labs respond? Anthropic, OpenAI, and Google have not faced genuine price competition from Chinese models with claimed frontier performance. A price war would compress margins across the industry. The alternative is segmentation, where Western labs retreat to enterprise contracts and regulatory moats while Chinese models dominate price-sensitive markets.

Third, what does MiniMax ship next? A single model release proves capability. A sustained cadence of improvements proves the underlying infrastructure works. The 34-day gap between IPO and M2.5 suggests velocity, but velocity needs to continue.

The pricing is real. The benchmarks are claims. The market will sort the difference.


References


I write about AI model releases and their market implications regularly. If this kind of breaking coverage is useful to you, consider subscribing so you don’t miss the next one.