MiniMax M3: Open-Weight 1M Context Model Drops June 11

Abstract visualization of MiniMax M3 sparse attention neural network architecture with blue and white connection nodes on dark background

MiniMax M3: open-weight frontier model with 1M-token context and MSA architecture

MiniMax dropped M3 on June 1 with a three-for-one claim that most labs would spread across three separate announcements: frontier-level coding performance, a genuine 1-million-token context window, and native multimodality — all in a single open-weight model. The API is live now. The weights land on Hugging Face around June 11. And there is one caveat serious enough that you should read it before designing any production system around self-hosting.

The Architecture Is the Real Story

The headline numbers only make sense once you understand MiniMax Sparse Attention (MSA), the architectural innovation that makes 1M context economically viable rather than technically theatrical.

Standard full attention has quadratic computational cost: double the context length, quadruple the compute. At 1M tokens that math is brutal. MSA solves it with a two-stage mechanism. A lightweight index branch scans incoming tokens first and selects which blocks of the key-value cache are actually relevant. Full attention then runs only on those selected blocks — not the entire context. MiniMax reports this delivers 9x faster prefill, 15x faster decoding, and 1/20th the per-token compute of M2 at 1M-token context.

The comparison worth making is to DeepSeek’s Multi-head Latent Attention (MLA), which takes a different approach — compressing key-value pairs into a low-dimensional latent space. MSA instead keeps KV uncompressed and uses block-level selection. What matters for developers is that MSA produces a model that can maintain useful long-context performance at a price point where 1M-token requests are not reserved for funded AI labs.

On the Benchmarks: Keep Your Head

MiniMax reports 59.0% on SWE-Bench Pro — ahead of GPT-5.5 at 58.6% and Gemini 3.1 Pro at 54.2%. That is genuinely impressive for an open-weight model. But before you redeploy your stack around it, the context matters.

Model	SWE-Bench Pro	Source
Claude Opus 4.8	69.2%	Anthropic (vendor)
MiniMax M3	59.0%	MiniMax (vendor)
GPT-5.5	58.6%	OpenAI (vendor)
Gemini 3.1 Pro	54.2%	Google (vendor)

Every number in that table is vendor-reported, run on the company’s own infrastructure, with baselines they chose. MiniMax ran its evaluation using Claude Code as scaffolding — which is standard practice, not cheating — but it also compared M3 against Opus 4.7 rather than Opus 4.8, which shipped three days before M3 launched. Against the current frontier, M3 trails by roughly 10 points. Independent third-party scores from Artificial Analysis and LMArena had not been published at launch.

The honest framing: M3 reaches frontier range on the headline coding benchmark. That is a meaningful step for an open model. It is not a claim to the top spot.

The Pricing Changes the Math

Where M3 makes a stronger case is cost. At launch-week promotional pricing, it runs at /bin/bash.30 per million input tokens and .20 per million output tokens — roughly 15x cheaper than Claude Opus (/5) and over 25x cheaper than GPT-5.5 (~0/0) on the input side. Standard pricing after the promotion is /bin/bash.60/.40, which remains far below closed frontier alternatives.

Model	Input ($/M tokens)	Output ($/M tokens)
MiniMax M3 (promo)	/bin/bash.30	.20
MiniMax M3 (standard)	/bin/bash.60	.40
Claude Opus 4.8	.00	5.00
GPT-5.5	~0.00	~0.00

For pipelines processing long documents, large codebases, or running bulk agentic workflows, that price gap is decisive. The emerging pattern among developers evaluating M3 is hybrid routing: use M3 for cost-sensitive, high-volume, long-context work, and route to a closed frontier model for the narrow slice where the last quality points actually matter.

How to Access MiniMax M3 Now

The API is available through multiple routes. The native endpoint is https://api.minimax.io/v1/text/chatcompletion_v2. On OpenRouter the model ID is minimax/minimax-m3, which lets you use the OpenAI SDK with a base URL swap:

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_KEY",
)

response = client.chat.completions.create(
    model="minimax/minimax-m3",
    messages=[{"role": "user", "content": "Your prompt here"}]
)
print(response.choices[0].message.content)

The model is also available on SiliconFlow and Together.ai, and MiniMax Code provides a direct interactive interface. For local deployment, Ollama has listed the model in its library ahead of the weight release.

Open Weights and the License You Need to Read

MiniMax committed to publishing model weights and a technical report on Hugging Face (watch huggingface.co/MiniMaxAI) and GitHub within ten days of launch — targeting around June 11.

One thing to check the moment the weights ship: the license. MiniMax’s prior model M2.7 launched under a “Modified-MIT” that restricted commercial use without written authorization. That restriction covered products charged to third parties, API interfaces, and post-fine-tuning deployment for profit. It got criticized as faux open-source — and deserved that criticism. Whether M3 ships with a similar restriction or a genuinely permissive license is unknown at time of writing.

If your plan involves self-hosting M3 for a commercial product, read the license terms before you architect around it. The weights being available for download does not automatically mean they are free to deploy commercially.

Who Should Pay Attention

M3 is worth evaluating immediately if you are running agentic coding pipelines where cost is a constraint, processing long documents or codebases at scale, or building multimodal workflows that need text, image, and video in a single context. It is less compelling if you need the absolute ceiling on coding quality right now — Opus 4.8’s 10-point benchmark lead is real, and until independent scores validate M3’s numbers, some skepticism is warranted.

The weights landing in the next week changes the picture for self-hosters. When they arrive, read the license, run your own evals, and decide then.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.