Anthropic: AI Writes 80% of Its Code, Calls for Pause

AI neural network circuit board illustration representing Anthropic recursive self-improvement and AI writing its own code

On June 4, Anthropic published internal numbers it had never released before: Claude now authors over 80% of the code merged into Anthropic’s own codebase, up from single digits when Claude Code launched in February 2025. Engineers are shipping 8x more code per quarter than they were in 2024. And the company that built all of this has a clear-eyed warning — the acceleration may already be outpacing the safety work needed to handle it. The report, titled “When AI Builds Itself,” is the most concrete public accounting of what Anthropic recursive self-improvement looks like at a working frontier lab.

From 4-Minute Tasks to 12-Hour Tasks in Two Years

The most defensible data in the report isn’t the lines-of-code count — it’s the task horizon table. In March 2024, Claude Opus 3 could reliably complete tasks up to four minutes long. In March 2025, Claude Sonnet 3.7 extended that to 90 minutes. By March 2026, Claude Opus 4.6 handles 12-hour tasks autonomously. Claude Mythos Preview, as of May 2026, reaches 16 hours or more. That doubling rate has accelerated from once every seven months to once every four months.

The other numbers are harder to argue with, too. On open-ended coding tasks, Claude’s success rate reached 76% in May 2026 — up 50 percentage points in six months. Code optimization speedups went from roughly 3x in May 2025 to 52x in April 2026. On research judgment tasks — choosing the right next step in an experiment — Claude beat human decisions 51% of the time in November 2025 and 64% in April 2026. The trend line is consistent across every measure.

Yes, the skeptics on Hacker News are partially right: lines of code is a flawed proxy for value. AI generates verbose code that buries human reviewers — one engineer declined to review a 40,000-line AI-generated PR because, as they put it, “I couldn’t possibly vet this.” However, lines of code isn’t the only metric here, and dismissing the entire report because of that one limitation misses the point.

The Bottleneck Has Already Shifted — and Most Developers Haven’t Noticed

The most important paragraph in Anthropic’s report isn’t the one about pause conditions. It’s this one: as code generation accelerated inside Anthropic, human code review became the new bottleneck. They’ve already hit Amdahl’s law in their own organization — throwing more AI at code production doesn’t help if humans can’t review output fast enough. Consequently, Anthropic launched Claude Code Review in March 2026 specifically to address this constraint.

One engineer quoted in the report noted they hadn’t written code themselves in five months. The role had shifted entirely to reviewing AI output and deciding what problems to tackle next. Anthropic calls the latter “research taste” — the judgment to choose which problem matters, which experiment to run, which direction to pursue. It’s the one area where Claude consistently underperforms when left unprompted. “Without that judgment,” the report states, “Claude is a capable assistant, but not a system that could drive AI progress on its own.”

That’s the practical takeaway for working developers: the scarcity has moved. Writing code faster is not the skill to optimize. Reviewing AI-generated code effectively and directing AI toward the right problems — that’s where career value is concentrated right now.

Related: Project Glasswing Expands: Claude Mythos Found 10,000+ Critical Bugs

Calling for a Pause While Filing for a $965B IPO

The credibility problem with Anthropic’s pause call is obvious and real. The company filed for an IPO at a $965 billion valuation. It has raised $65 billion. It is actively building the technology it’s warning the world about. VC David Sacks has called this “regulatory capture” — using safety framing to slow down open-source competitors while Anthropic’s proprietary models continue advancing. As SiliconANGLE noted, there’s a visible tension between Anthropic’s safety messaging and its fundraising trajectory.

All of that may be true. But the underlying call still matters: Anthropic says some models could be capable of recursive self-improvement — fully autonomous AI designing its own successors — within two years. Their proposed solution is a verifiable global pause, where frontier labs can confirm that others have actually stopped. “Training runs are far easier to conceal than missile silos,” the report acknowledges, admitting the mechanism to verify such a pause doesn’t yet exist.

The pause is almost certainly not happening. Coordination across Google, OpenAI, Meta, Mistral, and others — across geopolitical lines and without enforcement mechanisms — is not a realistic near-term outcome. Nevertheless, Anthropic naming the scenario explicitly and putting a two-year timeline on it is different from what any frontier lab has said publicly before. Whether the motivation is genuine or strategic, the data behind the claim comes from their own internal production metrics — and that’s harder to dismiss than a press release. The Hacker News discussion (393 points) reflects this split: real skepticism about the metrics, but acknowledgment that the acceleration is happening.

Related: Anthropic IPO Filing: What the $965B Valuation Means

Key Takeaways

Claude now authors more than 80% of Anthropic’s production code, with engineers shipping 8x more code per quarter — the most concrete public data on AI-driven development acceleration yet published.
Task completion horizons expanded from 4 minutes (2024) to 12 hours (2026), doubling every four months — a stronger signal of acceleration than any lines-of-code stat.
The bottleneck has already shifted from writing code to reviewing it and directing what gets built; “research taste” is the last area where AI consistently underperforms humans.
Anthropic’s pause call faces impossible coordination challenges and obvious credibility problems, but the underlying two-year recursive self-improvement timeline is new, specific, and backed by internal production data.
For developers: optimize for code review quality and problem selection, not writing speed — that’s where scarcity is moving.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.