Cloudflare’s AI Code Review: 7 Agents, $1.19 Each

Data visualization dashboard showing Cloudflare AI code review metrics with 7 specialized agents and risk-tiered costs

Cloudflare just published 30 days of production data on its AI code review system: 131,246 review runs across 48,095 merge requests, at a median cost of $0.98 and a median completion time of 3 minutes 39 seconds. Not a beta. Not a proof-of-concept. Full production, across all 5,169 Cloudflare repositories. The numbers are hard to argue with.

The Problem It’s Solving

AI coding tools made a specific problem dramatically worse. Teams using AI assistants are generating 98% more pull requests while their review time has climbed 91%, according to LinearB’s 2026 analysis of 8.1 million PRs. AI-generated PRs wait 4.6x longer before a reviewer even picks them up. The tools that were supposed to speed teams up created a new, very human bottleneck. As Cloudflare’s engineering team put it: “Code review is a fantastic mechanism for catching bugs and sharing knowledge, but it is also one of the most reliable ways to bottleneck an engineering team.”

The Architecture: Specialists Beat Generalists

The core insight is not that Cloudflare used AI for code review — it’s how they structured it. Rather than pointing one model at a diff with a generic prompt, they run up to seven specialized agents per merge request:

Security — flags only exploitable or concretely dangerous issues; ignores theoretical risks
Code Quality — logic errors and best practices
Performance — efficiency concerns
Documentation — completeness and clarity
Release Management — deployment readiness
Compliance — adherence to Cloudflare’s internal Engineering Codex
AGENTS.md — whether the repo’s AI instruction file needs updating

A coordinator agent — running on Claude Opus 4.7 or GPT-5.4 — reads all seven outputs, deduplicates overlapping findings, re-categorizes issues, filters out speculative noise, and posts a single structured review comment. The coordinator is the only component running frontier-tier models; heavy-lifting sub-reviewers run on Claude Sonnet 4.6 or GPT-5.3 Codex, and text-heavy agents like Documentation run on Kimi K2.5 to keep costs down.

This specialization matters more than it sounds. A single model with a generic “review this code” prompt is essentially being asked to be a security expert, a documentation auditor, and a compliance checker simultaneously. Specialization produces fewer but higher-quality findings — 1.2 per review on average, with the security reviewer producing the highest critical-issue rate at 4%.

The Economics

The system is also risk-tiered, which is what makes the unit economics work:

Tier	Lines Changed	Agents	Median Cost
Trivial	≤10 lines	2	$0.20
Lite	≤100 lines	4	$0.67
Full	>100 lines or security-sensitive	7+	$1.68

You do not send Claude Opus to review a README typo fix. Security-sensitive files — anything touching auth/ or crypto/ directories — always trigger full review regardless of diff size.

The team processed roughly 120 billion tokens per month and kept costs manageable through an 85.7% prompt cache hit rate, saving an estimated five figures monthly. The trick: instead of duplicating the full MR context across all seven concurrent agents, they write it to disk once and have each agent read the shared file — eliminating a 7x token multiplication.

What It Still Cannot Do

Cloudflare is refreshingly direct about the limitations. The system struggles with architectural awareness — it sees the diff but not the design intent behind it. It cannot verify that all downstream consumers of an API have updated when a contract changes. It catches obvious lock misses but not subtle deadlocks. And a 500-file refactor run through seven frontier models costs real money.

The break-glass override — where a comment of “break glass” forces approval regardless of AI findings — was used only 288 times across 48,095 merge requests (0.6%). Engineers almost never need to override it.

What This Means for Engineering Teams

The architecture here — multi-agent, specialized, coordinator-synthesized — is the template that other engineering teams will copy. The specifics (OpenCode, GitLab, Cloudflare Workers KV for control plane) are Cloudflare-specific, but the pattern is transferable. Single-model generic code review produces noise. Seven specialized agents with a coordinator produces a review that engineering leads at Cloudflare are actually relying on.

The deeper issue Cloudflare’s post surfaces: the code review bottleneck is the unsexy problem that actually determines whether AI-assisted development delivers on its productivity promise. Generating code faster is worthless if it piles up waiting for review. At $1.19 per review with a 4-minute turnaround, Cloudflare has a credible answer to that problem — built on top of OpenCode, the open-source agent that any team can start with today.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.