MiMo Code Claims to Beat Claude Code: Read the Fine Print

Split-screen comparison of MiMo Code and Claude Code terminal interfaces showing persistent memory architecture

Xiaomi’s MiMo AI team released MiMo Code V0.1.0 on June 10, 2026 — an open-source, terminal-native coding agent claiming to score 82% on SWE-bench Verified versus Claude Code’s 79%. The numbers are vendor self-reported and don’t appear on any public leaderboard. However, the architecture behind those numbers is a different story entirely.

The Problem MiMo Code Is Actually Solving

Context window exhaustion on long autonomous coding tasks isn’t a fringe complaint. Transformer attention is U-shaped: models recall facts at the beginning and end of long prompts reliably, but recall in the middle 40–60% of a prompt drops by 25–40%. Moreover, run a coding agent for four hours on a complex refactor and it starts forgetting code it wrote two hours ago. The session breaks, you restart, and the cycle continues.

This is the real blocker for production coding agents — not raw intelligence, but memory durability. MiMo Code attempts to solve it at the architecture level rather than simply expanding the context window further. That bet is worth paying attention to, even if the headline numbers need scrutiny.

How MiMo Code’s Four-Layer Memory Works

MiMo Code implements four persistent memory layers running simultaneously:

checkpoint.md — Live session state and task progress, auto-generated continuously
MEMORY.md — Persistent project knowledge: architecture decisions, conventions, constraints. Human-editable Markdown that survives session restarts
Global preferences — Cross-project user settings that persist between codebases
Raw history traces — Full reconstruction fallback for complex task recovery

Importantly, a dedicated background subagent handles memory condensation independently from the main coding agent. It doesn’t compete for the primary agent’s token budget. When context approaches capacity limits, the background agent condenses everything into structured summaries, then injects the most relevant context within a 65K token budget on resume — prioritizing recent checkpoints over raw history.

Two additional commands extend the system further. The /dream command runs every seven days to merge, deduplicate, and compress historical sessions into compact state files. Additionally, /distill mines past sessions for repeated workflows and packages them into reusable skills — similar to how Amazon’s AgentCore Memory approaches long-term agent knowledge.

The independently verifiable claim: running MiMo-V2.5-Pro inside both Claude Code and MiMo Code shows the architecture itself contributes roughly five percentage points of benchmark improvement, independent of model quality. That part holds up under scrutiny.

About Those MiMo Code Benchmark Numbers

MiMo Code claims 82% on SWE-bench Verified (Claude Code: 79%), 62% on SWE-bench Pro (Claude Code: 57%), and 73% on Terminal Bench 2 (Claude Code: 69%). None of these numbers appear on official public leaderboards as of this writing.

There’s also an internal contradiction worth flagging. OpenAI’s Codex CLI officially scores 82.2% on Terminal-Bench 2.0. If MiMo Code claims 73% on the same benchmark and is supposedly outperforming Claude Code, then Claude Code would need to score below 73% — which doesn’t align with what public results show. As TechTimes noted, these scores are self-reported. The human A/B evaluation across 1,213 head-to-head pairs is a valid methodology, but it’s not the same as the SWE-bench methodology. These numbers may check out once formally submitted. Right now, however, “82% on SWE-bench” without a leaderboard entry is a marketing claim, not a result.

The Practical Case For and Against

The case for trying MiMo Code is straightforward. It’s MIT licensed — you can fork it, audit it, and self-host it, none of which Claude Code allows. The default model (MiMo-V2.5-Pro) runs at $0.40/$2.00 per million input/output tokens versus Claude Opus at $5.00/$25.00 — roughly a 12x price difference. Furthermore, it supports multiple backends (DeepSeek, Kimi, GLM, custom), so you’re not locked into Xiaomi’s model stack. It also imports existing Claude Code configurations without a migration rewrite.

Installation takes a single command on macOS or Linux:

curl -fsSL https://mimo.xiaomi.com/install | bash

However, the case against is equally real. This is a v0.1.0 release — first public versions of any tool carry rough edges and API instability by definition. The free tier routes traffic through Xiaomi’s servers, which is a non-starter for proprietary code. The curl | bash installation pattern raises standard supply chain concerns. And the benchmark numbers need independent verification before anyone should make infrastructure decisions based on them.

What This Actually Signals

MiMo Code isn’t a standalone product. Rather, it’s the developer-facing entry point for Xiaomi’s broader MiMo platform — which includes MiMo SoloEngine, a no-code agent-building environment targeting non-technical users. The free, MIT-licensed, impressively-benchmarked-on-paper approach is the same playbook that made DeepSeek’s open-source release a worldwide event: ship something compelling for free, capture Western developer mindshare, sell enterprise services upstream.

Whether or not MiMo Code’s benchmark claims survive external scrutiny, the architecture it proposes — persistent, structured, repo-aware memory managed by a dedicated background process — is going to matter. The New Stack’s analysis of the coding agent endurance gap makes the case clearly: the teams that win in autonomous coding will solve memory, not context window size. MiMo Code is betting on the right problem. Whether their numbers reflect a real solution or optimistic self-scoring is what independent benchmarks will determine.

Read the official MiMo Code release announcement for full technical details. Just keep the v0.1.0 maturity label in mind before running it on production code.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.