
Microsoft just shipped its first in-house coding model directly into GitHub Copilot. MAI-Code-1-Flash is live now, rolling out across all Copilot tiers — Free through Max — with no extra setup, no extra cost, and no waiting for a separate product announcement. If you use Copilot in VS Code, this model is either already in your picker or will be shortly.
Not Just Another Fine-Tune
The story here is not the model numbers. It’s the training approach. Every model currently available in GitHub Copilot — GPT-4o, Claude Haiku 4.5, Gemini — was built for general use and then plugged into Copilot’s interface. MAI-Code-1-Flash took a different path: Microsoft trained it directly inside the production Copilot harness developers use every day.
That means the model learned what a Copilot diff looks like, what “apply edit” means, how terminal commands are routed, and what the format of a real repository question-answering task looks like in the actual IDE context — not in a synthetic benchmark wrapper. Microsoft ran reinforcement learning across more than 150,000 environments and evaluated checkpoints against telemetry-grounded tasks drawn from real Copilot usage patterns.
Whether this approach delivers better real-world results than benchmark scores predict is the only thing worth watching. The training methodology at least makes a coherent argument for why it might.
The Benchmark Numbers (With One Honest Caveat)
Microsoft reports strong results across standard coding benchmarks. On SWE-Bench Pro — a measure of real-world software engineering tasks — MAI-Code-1-Flash scores 51.2% against Claude Haiku 4.5’s 35.2%, a 16-point lead. On SWE-Bench Verified it reaches 71.6% versus Haiku’s 66.6%. Instruction following shows the widest gap: +28.9 points on IF Bench. Full details are available in the official model card.
The caveat worth noting: Microsoft compares exclusively to Claude Haiku 4.5. Haiku is Anthropic’s lightweight, fast tier — it is not Claude Sonnet 4.6, and it is not GPT-4o. MAI-Code-1-Flash is competing for the fast, cost-efficient coding slot, not the frontier reasoning slot. That’s the right positioning for an agentic coding model, but the benchmark selection obscures where it sits in the broader landscape.
MoE Architecture and Token Efficiency
The model uses a sparse Mixture-of-Experts architecture: 137 billion total parameters with only 5 billion active at inference time. Combined with a 256K-token context window, it can handle large codebases without punishing you on cost. That 5B active footprint is why Microsoft can afford to route it through the Auto picker without dramatically increasing serving costs.
On harder problems, Microsoft reports up to 60% fewer tokens used to arrive at a correct solution compared to Claude Haiku 4.5 — a result of what they call “adaptive solution length control,” where the model calibrates response depth to task complexity. This timing is notable: GitHub Copilot recently moved agent users to token-based billing, and some teams reported 10x–50x cost spikes. A model that uses fewer tokens for the same quality output is directly relevant to those cost concerns.
How to Enable It Now
If you’re on any paid Copilot plan — or even the free tier — the model is rolling out through VS Code’s model picker. Rollout is gradual, so it may not be available to every user today. The GitHub Changelog entry confirms the phased rollout is ongoing:
- Open Copilot Chat in VS Code:
Ctrl+Shift+Ion Windows,Cmd+Shift+Ion Mac - Click the model picker at the bottom of the chat panel
- Select MAI-Code-1-Flash from the list
Alternatively, leave the Auto picker enabled. Microsoft has designed the Auto router to route tasks to whichever model is best suited — and since they control both the model and the router, there is a clear incentive for MAI-Code-1-Flash to receive a meaningful share of routing decisions. Whether that’s a feature or a flag depends on your trust level.
What to Actually Try It On
Given the training focus — agentic coding, real repo tasks, refactoring, repository QA — the highest-value test is not autocomplete. Run it on a multi-file refactor, a bug hunt across a large codebase, or a repository question where it needs to read context before answering. Those are the scenarios where training-on-production-harness should show up, if it shows up anywhere.
For line completions and simple chat, the model picker choice matters less. For agent-mode tasks where the model needs to plan, read files, and apply edits across multiple steps, this is the right experiment to run. The GitHub community discussion thread is already collecting early impressions if you want real-world benchmarks before committing.
The Bigger Picture
MAI-Code-1-Flash is the coding companion to MAI-Thinking-1, the reasoning model Microsoft announced at the same event. Together they represent Microsoft’s first fully in-house model family — built to reduce OpenAI dependency, serve at lower cost, and train on data Microsoft controls. The fast-coding tier has been a two-player market (Claude Haiku vs GPT-4o-mini) for the past year. Microsoft just entered it with their own stake, built into the product 15 million developers already use every day.













