Claude Opus 4.7 Coding: xhigh, Task Budgets, /ultrareview

Claude Opus 4.7 coding interface showing xhigh effort, task budgets, and multi-agent code review features

Claude Opus 4.7 launched April 16, 2026 with three coding-specific features that will change how you use Claude for development. The xhigh effort level is now the default in Claude Code, balancing quality and token spend for agentic coding. Task budgets finally solve the runaway loop problem for long-horizon agents. And /ultrareview brings multi-agent code review directly into your workflow. These aren’t just benchmark improvements. SWE-bench Verified jumped to 87.6% from 80.8%, but the real story is what these features mean for your daily workflow.

xhigh Effort: The New Default You’re Already Using

If you opened Claude Code this week, you’re already using xhigh effort. Anthropic made it the default for all plans, replacing high as the recommended coding effort level. This isn’t a setting you need to toggle. It’s automatic.

Here’s why it matters. The xhigh effort level triggers Claude’s deep thinking mode more frequently than high. The model proactively reflects on intermediate results and backtracks on failed tool call paths. Internal Anthropic data shows xhigh provides a “compelling sweet spot” between performance and token expenditure. Max effort yields roughly 75% on coding tasks, but xhigh gets close without the runaway token usage.

The trade-off is token cost. xhigh roughly doubles token usage compared to high on long agentic runs. But you get adaptive thinking in return. Unlike earlier models, Opus 4.7 uses adaptive thinking only, where the model decides how many reasoning tokens to spend on a given turn. The effort knob biases that decision. At xhigh, Claude almost always thinks deeply.

When should you step down to a lower effort level? Use medium for cost-sensitive workloads where the task is well-defined and context-dependent. Use low only when the answer is obvious: syntax fixes, simple renames, code formatting. For most coding work, xhigh is the right default. Step up to max only when your evaluations show measurable headroom at xhigh.

Task Budgets: Agents That Finish Gracefully

Developers deploying long-running agents faced a critical problem. Agents would either run forever, burning through tokens with no end in sight, or fail abruptly mid-action with no summary of progress. Task budgets solve this.

Task budgets let you tell Claude how many tokens it has for a full agentic loop, including thinking, tool calls, tool results, and output. The model sees a running countdown and uses it to prioritize work. As the budget exhausts, Claude finishes gracefully: it summarizes findings, reports progress, and wraps up instead of cutting off mid-sentence.

This is a soft hint, not a hard cap. Claude may occasionally exceed the budget if interrupting would be more disruptive than finishing an action. For a hard ceiling, combine task budgets with max_tokens. Use task_budget as the target Claude paces against. Use max_tokens as the absolute cap preventing runaway generation. The two values are independent. One is not required to be at or below the other.

Task budgets are currently in public beta. Set the task-budgets-2026-03-13 beta header to opt in. They work best for agentic workflows where Claude makes multiple tool calls and decisions before finalizing output. If you want Claude to self-regulate token spend on long-horizon tasks or enforce a predictable per-task cost ceiling, task budgets are production-ready.

One critical detail: the task budget counts only what Claude sees this turn, not the full conversation payload. In an agentic loop, your client resends the full conversation on every request, so the payload grows turn over turn. But the budget decrements only by the tokens Claude sees in the current turn. This keeps budgeting predictable even as conversation history expands.

/ultrareview: Multi-Agent Code Review in 5-10 Minutes

Type /ultrareview in Claude Code and you get comprehensive multi-agent code review. Multiple specialized AI reviewers analyze your diff in parallel. Each agent focuses on different concerns: security issues, quality and style, test coverage, logic errors, and edge cases.

The architecture shows where AI tooling is heading. Opus-class sub-agents handle bugs and logic issues. Sonnet-class agents handle style concerns and policy violations. Security reviewers check input validation, injection risks like SQL and XSS, authentication and authorization issues, secrets exposure, and error handling that leaks sensitive information. Quality reviewers evaluate complexity, dead code, duplication, and adherence to project conventions. Test reviewers assess coverage ROI and flakiness risks.

The verification pass is the core value proposition. After the parallel analysis, a verification stage confirms candidates and filters false positives. Style suggestions that don’t rise to the level of bugs get filtered out. Edge cases already handled elsewhere get deduplicated. Only confirmed findings make it into the final report.

Runtime is approximately 5 to 10 minutes with a cost of $5 to $20 per run. Use /ultrareview before merging critical code, on security-sensitive pull requests, or during large refactors affecting multiple files. Don’t use it on every trivial PR. The cost and runtime make sense for high-stakes merges, not routine style fixes.

When to Upgrade to Opus 4.7

If you’re using Claude Code, you already upgraded. The xhigh default applied automatically. If you’re using the Claude API directly, set model=”claude-opus-4.7″ explicitly to upgrade.

Should you upgrade from Opus 4.5 or 4.6? The SWE-bench improvements are substantial. SWE-bench Verified jumped from 80.8% to 87.6%, a 6.8-point gain. SWE-bench Pro climbed from 53.4% to 64.3%, a 10.9-point jump. Claude now beats GPT-5.4 Pro at 57.7% and Gemini 3.1 Pro at 54.2% on SWE-bench Pro, making it the clear leader for autonomous software engineering tasks.

But there’s a catch. Opus 4.7 ships with a new tokenizer that produces 35% to 40% more tokens for the same input text. While Anthropic kept per-token pricing unchanged at $5 per million input tokens and $25 per million output tokens, your real bill per request can jump 1.5x to 3x. This is the “tokenizer tax.” The rate card didn’t change, but token counts did.

Upgrade if you’re running complex agentic tasks, need the vision improvements (3.3x higher resolution at 2,576 pixels), or use Claude Code. Stay on Opus 4.6 or Sonnet 4.6 if you’re budget-constrained or working with simple chat and writing tasks. For most production inference, Sonnet 4.6 remains the right default at 40% cheaper per token.

What This Means

Claude Opus 4.7 makes agentic coding production-ready. Task budgets eliminate the runaway loop problem. xhigh effort balances quality and cost for repeated tool calling. /ultrareview brings multi-agent code review directly into your workflow.

The competitive benchmarks confirm Claude’s coding leadership. An 87.6% SWE-bench Verified score puts Opus 4.7 ahead of GPT-5.4 Pro and Gemini 3.1 Pro. But the practical features matter more than the benchmarks. Developers can finally deploy long-running agents with confidence, knowing they’ll finish gracefully within budget.

Try the features this week. xhigh is already your default in Claude Code. Test task budgets on long-horizon agentic workflows. Run /ultrareview on your next critical pull request. These aren’t research demos. They’re production tools you can use today.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.