MiniMax M3: Frontier Coding Model at 5% of Claude’s Price

MiniMax M3 launched today as the first open-weights model to combine three things that used to require a closed proprietary system: frontier coding performance, a one-million-token context window, and native multimodality. The API is live now. Open weights arrive in roughly ten days. Standard pricing is $0.60 per million input tokens — about 12% of what Claude Opus charges. For developers spending thousands per month on AI API costs, that math is worth paying attention to.

The benchmark case

MiniMax M3 scores 59.0% on SWE-Bench Pro, the benchmark that uses real GitHub issues from production repositories rather than synthetic problems. That number beats GPT-5.5 and Gemini 3.1 Pro. It trails Claude Opus 4.7, but the gap is smaller than the price gap would suggest.

On autonomous web browsing (BrowseComp), M3 scores 83.5 — above Opus 4.7’s 79.3. On Terminal-Bench 2.1, which tests agentic CLI task completion, it scores 66.0%. The MCP tool-use benchmark (MCP Atlas) comes in at 74.2%.

The long-horizon demos are harder to summarize but more revealing. MiniMax ran M3 on a 12-hour paper reproduction task — it autonomously reproduced an ICLR 2025 Outstanding Paper Award winner, generating 18 commits and 23 experimental figures without human intervention. In a separate test, M3 optimized a CUDA kernel from 7.6% to 71.3% Hopper GPU hardware utilization over 24 hours and 147 attempts, with no reference solution provided. These are not benchmark numbers. They are evidence that the model can sustain coherent long-horizon work.

What you actually pay

Here is the current pricing landscape for frontier-class coding models:

Model	SWE-Bench Pro	Input per 1M tokens	Context
MiniMax M3	59.0%	$0.60 (std) / $0.30 (promo)	1M tokens
Claude Opus 4.7	~63%	$5.00	200K tokens
GPT-5.5	~55%	~$7.50	128K tokens
Gemini 3.1 Pro	~53%	~$3.50	1M tokens

A task that processes 500,000 input tokens and generates 100,000 output tokens costs roughly $0.27 on M3 at promotional pricing, or $0.54 at standard rates. The same task on Claude Opus costs approximately $5.00. That is not a small difference for teams running hundreds or thousands of agent calls per day.

Why the context window actually works here

One-million-token context windows have become a common spec claim, but most implementations are slow enough at that scale to be impractical for production use. MiniMax’s architecture addresses this directly.

M3 uses MiniMax Sparse Attention (MSA), which replaces standard full attention — where every token is compared to every other token in O(n²) time — with a block-selection approach. Before processing, MSA pre-selects which key-value cache blocks are relevant to each query. Each selected block is read once with contiguous memory access, avoiding the scattered random reads that make full attention expensive at long context.

The measured result: at one million tokens, M3 achieves 9.7x faster prefill and 15.6x faster decoding compared to the previous M2 generation, at one-twentieth the per-token compute. It runs 4x faster than the best competing open-source sparse attention implementation. That means the 1M context window is genuinely usable for processing large codebases in a single call, not just a number on a comparison slide.

How to access it now

Three paths are available today:

MiniMax API directly: The endpoint at api.minimax.io uses an OpenAI-compatible interface. Any code already calling OpenAI’s completions API can be pointed at MiniMax with a model name swap and a new API key.

OpenRouter: If you want to test without creating a MiniMax account, OpenRouter has M3 available at promotional pricing. Change the endpoint, pass minimax/minimax-m3 as the model, and the rest of your code stays identical.

Open weights: Not available at launch. MiniMax has committed to publishing weights on Hugging Face and GitHub within ten days. Once available, the model can run on vLLM or SGLang with MSA support — at which point the per-token cost becomes zero for self-hosted deployments. That last point is the one to watch.

The practical routing strategy

M3 is not a Claude replacement for every task. For multi-file refactoring on complex legacy codebases, or decisions where the cost of a model error is high, the small quality gap between M3 and Opus 4.7 still matters. But for the large volume of straightforward coding tasks — generating boilerplate, reviewing PRs, answering questions about large codebases, parsing long documents — the cost advantage is hard to argue against.

The pattern that makes sense: route the bulk of volume to M3, escalate genuinely difficult edge cases to a more expensive frontier model. Teams using this approach report 70–85% reductions in total AI API spend without a perceptible drop in output quality on production paths.

What to verify before committing

Two things worth checking before you build a production dependency on M3. First, all benchmark numbers are vendor-reported. Independent community evaluation will run over the coming weeks, and it may tell a different story on certain task types. Test M3 on your actual workloads before assuming SWE-Bench Pro numbers transfer directly.

Second, review the open-weights license terms before commercial deployment. MiniMax permits commercial use, but there are conditions. The full technical report and license publish with the weights. Read them before you ship.

M3 also joins a growing list of capable open-weights models from Chinese labs — DeepSeek, Qwen — that have consistently surprised Western developers with their price-to-performance ratio. The pattern is now predictable enough that assuming any proprietary frontier model holds a permanent cost-performance monopoly is a planning mistake. The Decoder’s analysis covers the competitive positioning in more depth for those tracking the broader landscape.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.