NewsAI & DevelopmentDeveloper Tools

Grok Build: xAI’s Coding Agent CLI Enters the Race — Honest Review

Split-screen comparison of Grok Build and Claude Code showing parallel agent architecture and benchmark scores

xAI shipped Grok Build on May 14 — a terminal-native coding agent that goes directly after Claude Code and Codex CLI. It runs parallel subagents, defaults to a Plan Mode that blocks edits until you approve, and supports MCP out of the box. It also scores 70.8% on SWE-Bench Verified, against Claude Code’s 87.6%. Both of those facts are true. Both matter.

What Grok Build Is

Grok Build is a CLI-first coding agent powered by grok-code-fast-1, a model xAI built from scratch on programming content and real-world pull requests. Installation is a single command:

npm install -g @xai/grok-build

From there, you authenticate, navigate to a project directory, and describe a task. Grok Build takes it from there — planning, searching documentation, and writing code across your entire codebase.

The multi-agent architecture is the headline feature. Grok Build spawns up to eight parallel subagents simultaneously, each working in its own Git worktree. They do not step on each other. An automated evaluation layer called Arena Mode scores competing outputs before you ever review them. In practice, larger refactors that would take one agent an hour can be parallelized across several agents working in isolation.

It also ships with MCP support out of the box, picks up AGENTS.md — the cross-vendor convention used by Claude Code and Codex CLI — and supports headless mode for automation and CI/CD pipelines via the -p flag. The Agent Coordination Protocol (ACP) integration lets teams build custom orchestration layers on top of Grok Build.

Plan Mode Is the Real Differentiator

The feature worth paying attention to is Plan Mode, which is on by default. Before Grok Build touches a single file, it proposes a step-by-step plan. You can approve it, comment on individual steps, or rewrite it entirely. Nothing runs until you sign off.

This directly addresses the thing developers hate most about coding agents: the agent does something, and by the time you notice it was wrong, three other things have already changed downstream. Plan Mode does not solve this perfectly — agents can still make bad decisions within an approved plan — but it puts a meaningful checkpoint between “task given” and “codebase modified.”

Claude Code does not have a native equivalent. That is a genuine edge for Grok Build, not marketing spin. xAI’s official Grok Build announcement describes Plan Mode as the central design decision — every other feature builds around the assumption that developers want to stay in the loop.

The Benchmark Gap Is Not a Rounding Error

xAI published a score of 70.8% on SWE-Bench Verified for grok-code-fast-1. The current SWE-Bench leaderboard has Claude Opus 4.7 Adaptive at 87.6% and OpenAI’s Codex at 85%. That is a 17-point gap — not a rounding error, not a methodology disagreement.

xAI’s response is to note that “SWE-Bench doesn’t fully reflect the nuances of real-world software engineering.” That is technically true of every benchmark. It does not close a 17-point gap. Independent replication of the 70.8% score has not yet been published, which is not a red flag on its own for a v0.1 release, but is worth noting.

On complex, multi-file, real-world tasks — exactly the use case Grok Build targets — benchmark scores do map to observable performance differences. At 70.8%, Grok Build is competitive with models from roughly a year ago. The parallel agent architecture helps, but it does not compensate for the underlying model gap when tasks require deep reasoning rather than parallel execution.

ToolSWE-Bench VerifiedPriceParallel AgentsPlan Mode
Grok Build70.8%$30–300/moYes (up to 8)Yes (default)
Claude Code87.6%$20/moNoNo
Codex CLI85%Pay-per-useNoNo

Pricing: Who This Is Actually For

Access started at $300/month (SuperGrok Heavy) and has since opened to SuperGrok ($30/month) and X Premium+ ($40/month), with a promotional tier at $99/month for early adopters. API access is priced at $0.20 per million input tokens and $1.50 per million output tokens.

Claude Code costs $20/month flat. At the $30 SuperGrok tier, Grok Build is not prohibitively expensive — but you are paying $10 more for a tool that currently underperforms on benchmarks. The $99–300 tiers are enterprise plays. For solo developers and small teams, Claude Code’s pricing structure is more accessible. DevOps.com’s analysis notes that the $300 tier is “aimed at professional engineering teams where the cost disappears into headcount.”

The Honest Assessment

Grok Build is a v0.1 release that shows architectural ambition most first-version tools skip. Parallel worktrees, Plan Mode, ACP, headless automation — xAI is betting that multi-agent parallelism and human-in-the-loop planning will outweigh raw benchmark performance over time. That is a defensible bet.

It is not a bet that has paid off yet. The model powering Grok Build needs to close the benchmark gap before the architectural advantages become the story. Right now, the architecture is interesting and the model is catching up. That is a different thing than being ready to replace Claude Code in production.

If you have a SuperGrok subscription, run Grok Build on a real project this week. Plan Mode alone is worth experiencing. For a deeper look at where the coding agent market stands, MarkTechPost’s benchmark-driven ranking is a useful reference. Just do not expect Grok Build to outperform tools with a 17-point benchmark lead — not yet.

ByteBot
I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

    You may also like

    Leave a reply

    Your email address will not be published. Required fields are marked *

    More in:News