
On April 24, DeepSeek dropped the preview of V4-Pro — a 1.6-trillion-parameter, MIT-licensed model that scores 80.6% on SWE-bench Verified, sits within 0.2 points of Claude Opus 4.6, and costs $3.48 per million output tokens versus Claude’s $25. That seven-fold price gap, against near-identical benchmark performance on coding tasks, is the most significant open-source AI development this quarter. If you’re paying closed-model rates for coding workloads, that choice just got harder to justify.
The Numbers That Matter
DeepSeek V4-Pro is a Mixture-of-Experts (MoE) model: 1.6 trillion total parameters, 49 billion activated per token. It ships with a 1-million-token context window across all providers. On the benchmarks developers actually care about:
| Model | SWE-bench Verified | LiveCodeBench | Output ($/1M tokens) | License |
|---|---|---|---|---|
| DeepSeek V4-Pro | 80.6% | 93.5% | $3.48 | MIT |
| Claude Opus 4.6 | 80.8% | — | $25.00 | Proprietary |
| GPT-5.5 | ~82% | — | $30.00 | Proprietary |
| DeepSeek V4-Flash | 79.0% | 91.6% | ~$1.50 | MIT |
V4-Pro also posts a Codeforces rating of 3206 — the highest ever recorded for an AI model. V4-Flash, the smaller 284B-parameter sibling, trails V4-Pro by just 1.6 points on SWE-bench. That gap is well within noise for most real-world coding tasks, which makes Flash compelling for high-throughput, cost-sensitive pipelines.
The License Is the Real Story
Benchmark tables are easy to scan. The MIT license is worth dwelling on. MIT means commercial use, modification, redistribution — and no data-sharing obligations or vendor restrictions. The weights are on HuggingFace and ModelScope today.
For teams operating under data residency requirements, handling sensitive code, or simply tired of the usage-limit roulette that comes with managed APIs, that combination — frontier-adjacent coding performance plus MIT weights — is a genuinely new option. Previously, self-hosting anything near this performance tier was not viable. Now it is.
How to Use It Today
V4-Pro is available via the official DeepSeek API, DeepInfra, Together.ai, OpenRouter, and NVIDIA NIM. The official API is OpenAI-compatible — migration from an existing OpenAI setup is a two-line change:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_DEEPSEEK_API_KEY",
base_url="https://api.deepseek.com/v1"
)
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role": "user", "content": "Your prompt here"}]
)
print(response.choices[0].message.content)
The official API is currently running a 75% discount (until May 31), which puts the effective output rate at roughly $0.87 per million tokens. Together.ai leads on latency at 0.99 seconds time-to-first-token. DeepInfra is the better choice for sustained production load, with FP4 quantization and cached-token pricing at $0.145 per million tokens.
What the Architecture Buys You
The 1-million-token context window is only economical because of V4-Pro’s hybrid attention design. It combines Compressed Sparse Attention (CSA) with Heavily Compressed Attention (HCA), interleaved across layers. At 1M-token context, this requires 27% of the inference FLOPs and 10% of the KV cache compared to DeepSeek V3.2. That efficiency is what makes long-context tasks viable at current pricing rather than prohibitive.
Be Honest About the Gaps
The NIST CAISI evaluation released May 3 is worth reading before you migrate anything critical. Across 9 benchmarks — including two non-public datasets designed to resist benchmark contamination — CAISI found V4-Pro performs comparably to GPT-5 (released around 8 months ago), not the current closed frontier. DeepSeek’s self-reported numbers look better because self-reported numbers usually do.
Two gaps matter for developers. On Terminal-Bench 2.0, which tests agentic terminal workflows, V4-Pro scores 67.9% versus GPT-5.5’s 82.7% — a 14.8-point gap that matters if your agents need to navigate real shell environments. And V4-Pro is text-only at launch. No vision, no multimodal. Teams doing UI validation or image-in-context work need to stay on closed models for now.
When to Use It
V4-Pro is the right call for coding-heavy workloads at scale where the 0.2-point SWE-bench gap versus Opus 4.6 is acceptable. It’s compelling for any team with data residency requirements that rules out managed APIs. At $3.48 per million output tokens, it’s worth running alongside your current provider and measuring real task performance on your actual workloads — not just benchmark tables.
If you need reliable agentic terminal navigation, multimodal context, or you’re building something where the NIST benchmark gap matters — stay with GPT-5.5 or Claude Opus 4.7 for now. The open-source option is genuinely competitive; it’s not uniformly superior.
The official V4 preview release notes and the Artificial Analysis provider comparison are the two resources worth bookmarking before you start evaluating providers.













