MiniMax M3: Open-Weight Model That Beats GPT-5.5 on Coding

MiniMax M3 sparse attention neural network visualization - open-weight frontier AI model benchmark comparison

MiniMax M3: 428B MoE model with MiniMax Sparse Attention, 1M token context window

MiniMax shipped weights on June 7 for M3, a 428-billion-parameter mixture-of-experts model that scores 59.0% on SWE-Bench Pro—edging past GPT-5.5’s 58.6%—at $0.30 per million input tokens. That’s 16 times cheaper than GPT-5.5 and about 50 times cheaper than Claude Opus 4.8. There’s a catch, though: MiniMax compared against Opus 4.7 at launch, conveniently missing Opus 4.8 (released three days earlier) which scores 69.2% on the same benchmark. M3 isn’t a clean sweep—but at that price, it doesn’t need to be.

What M3 Actually Is

M3 runs 428 billion parameters with roughly 23 billion active per token via a MoE routing architecture. The headline innovation is MiniMax Sparse Attention (MSA)—a replacement for standard full attention that pre-filters relevant KV-cache blocks instead of attending across all tokens. At one million tokens of context, this means 9x faster prefill, 15x faster decoding, and one-twentieth the compute per token versus M2. The model also supports image and video input natively, and can operate a desktop computer in agentic tasks.

Prior long-context models could technically fit a million tokens. M3 makes it economical to actually use them. That’s the architectural bet MiniMax is making: as agents need to hold entire codebases, conversation histories, and document sets in memory simultaneously, the efficiency of the attention mechanism stops being academic. MiniMax’s technical report shows M3 autonomously reproducing a research paper in 12 hours with 18 code commits, and optimizing a CUDA kernel from 7.6% to 71.3% hardware utilization across 147 iterations.

The Benchmark Picture—Honest Version

Here’s what the numbers actually say:

Model	SWE-Bench Pro	BrowseComp	PostTrainBench	Input ($/M tokens)
Claude Opus 4.8	69.2%	~79%	0.42 (1st)	~$15
MiniMax M3	59.0%	83.5%	0.37 (3rd)	$0.30
GPT-5.5	58.6%	N/A	0.39 (2nd)	$5.00

M3 beats GPT-5.5 on coding by a narrow margin and leads all models on BrowseComp—autonomous web agent tasks. But Opus 4.8 leads SWE-Bench by 10 points, and PostTrainBench (general instruction following) puts M3 in third. MiniMax’s own benchmarks cherry-picked Opus 4.7 as the comparison target. OpenRouter’s live latency and throughput stats give you an independent read on real-world performance.

The Price Math for Production

For teams running AI agents in production, token costs compound fast. Consider a coding agent processing 10 million input tokens per day:

M3: $3/day
GPT-5.5: $50/day
Opus 4.8: $150/day

Over 30 days, M3 costs $90 for the same volume that costs $1,500 on GPT-5.5 or $4,500 on Opus. For high-volume bug-fix pipelines, automated code review, or eval harnesses, that gap is hard to ignore. M3 is the default choice when you need frontier-range coding ability and Opus-tier polish isn’t strictly required.

Using M3 Today

The integration path is minimal. M3 exposes an OpenAI-compatible endpoint, so existing code changes by two lines:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.minimax.io/v1",
    api_key="YOUR_MINIMAX_API_KEY",
)
response = client.chat.completions.create(
    model="MiniMax-M3",
    messages=[{"role": "user", "content": "Refactor this function..."}],
)
print(response.choices[0].message.content)

LangChain integration works identically via ChatOpenAI pointed at the MiniMax base URL. Weights are live on Hugging Face for self-hosting. You’ll need roughly 440GB of storage for the FP8 checkpoint and at least eight high-end GPUs with tensor parallelism. SGLang has official M3 support; vLLM works with MSA support enabled. Mac Studio deployments via llama.cpp are possible—expect practical context limits below the 1M maximum.

Two Things to Know Before You Commit

First, the license. M3 ships under the MiniMax Community License, not Apache 2.0. Commercial use restrictions may apply to your use case. Read the terms before building a product on the weights—“open weights” and “fully open source” are not the same thing.

Second, context discipline. A one-million-token window is not an invitation to stuff everything in. Every token costs money. Filling the context for tasks that don’t need it inflates cost without improving output. Use the window when the task demands it—long document analysis, full-codebase context, multi-hour agentic runs. For standard coding tasks, a shorter context at the same model delivers the same result at a fraction of the cost.

Bottom Line

M3 is the right call for production coding agents and long-horizon agentic pipelines where Opus 4.8 is the quality ceiling but the budget says otherwise. It’s not a replacement for Opus on general instruction tasks. The benchmark cherry-picking at launch is a yellow flag—MiniMax knew the table looked cleaner without Opus 4.8 in it. But the underlying value holds: frontier-range coding at 16x lower input cost, open weights with self-hosting options, and a million-token context window that’s architecturally efficient rather than just technically possible. For cost-sensitive teams building on top of LLMs, that’s worth testing today.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

MiniMax M3: Open-Weight Model That Beats GPT-5.5 on Coding

What M3 Actually Is

The Benchmark Picture—Honest Version

The Price Math for Production

Using M3 Today

Two Things to Know Before You Commit

Bottom Line

Python 3.14 Free-Threaded Mode: The GIL Is Dead

Cascade Is Dead: Fix Your CI Pipeline Before It Breaks

Leave a reply Cancel reply

More in:News

GPT-5.6 Finds $500K WordPress Exploit in 10 Hours for $25

Kimi K2.7 Code Lands in GitHub Copilot: Open-Weight, Finally

Chatto: Open-Source Slack Alternative With Per-User Encryption

SQRL: Feyn’s Text-to-SQL Model Inspects Before It Writes

Kimi K3 Open Weights Drop July 27: The Developer Prep Guide

Roblox Build: AI Turns a Text Prompt Into a Playable Mobile Game

Categories

What M3 Actually Is

The Benchmark Picture—Honest Version

The Price Math for Production

Using M3 Today

Two Things to Know Before You Commit

Bottom Line

Share

You may also like

Leave a reply Cancel reply

More in:News

Categories

Latest Posts