Claude Advisor Tool: Opus Intelligence at Sonnet Prices

Two interconnected AI model nodes representing Claude Sonnet executor and Claude Opus advisor communicating via API

You’re running a Claude-powered coding agent on Sonnet 4.6 because Opus is too expensive for sustained agentic work. The math is brutal: Opus runs $15/MTok input versus Sonnet’s $3. You know the agent would make smarter architectural decisions on Opus, but “smarter” has a $12-per-million-token surcharge. Anthropic’s Advisor Tool, now updated and available in beta, changes this calculation. Sonnet runs the task end-to-end. Opus only weighs in when Sonnet hits a decision it can’t resolve confidently. The result: 74.8% on SWE-bench Multilingual versus 72.1% for Sonnet solo — and 11.9% cheaper than running Opus for everything.

One API Call, Two Models

The Advisor Tool is a server-side feature, not a wrapper you build yourself. You set Sonnet (or Haiku) as the executor model, add a beta header, and include the advisor tool definition in your tools array. From that point, the executor runs your task as normal. When it hits a wall — a multi-step planning problem, a debugging impasse, a complex refactor decision — it invokes the advisor tool with an empty input. Anthropic’s API handles the rest: Opus receives the full conversation transcript automatically, produces a strategic plan (typically 400–700 text tokens), and the executor continues with that context.

No extra API calls from your client. No context reconstruction. No custom orchestration layer. Everything happens within a single /v1/messages request, with billing split cleanly between executor and advisor rates in the usage.iterations array.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",          # executor
    max_tokens=8096,
    tools=[{
        "type": "advisor_20260301",
        "name": "advisor",
        "model": "claude-opus-4-6",     # advisor
        "max_uses": 3,                  # cap calls per request
        "max_tokens": 2048,             # cap advisor output (June 2026)
    }],
    messages=[{"role": "user", "content": "your agentic task here"}],
    extra_headers={"anthropic-beta": "advisor-tool-2026-03-01"},
)

That’s the full setup. Two parameters that weren’t in your previous call: the beta header and the tool definition. No prompt changes required. The executor decides when to call the advisor based on its own assessment.

The Numbers

Anthropic published benchmarks across two model pairings:

Sonnet 4.6 + Opus advisor: SWE-bench Multilingual improved from 72.1% (Sonnet solo) to 74.8%, while total cost per agentic task dropped 11.9% compared to running Opus for everything. You get better results and spend less. The catch: you’re spending more than Sonnet alone, so this pairing only makes sense if you’re currently routing hard tasks to Opus already.

Haiku 4.5 + Opus advisor: This is where it gets interesting. Haiku’s BrowseComp score jumped from 19.7% to 41.2% — more than doubled — while costing 85% less per task than Sonnet solo. For high-volume workloads where you’re currently running Sonnet, switching to Haiku with an advisor gets you roughly equivalent quality at a fraction of the price. A 100k-token all-Opus session runs $15–$20. The same session on Haiku with Opus advising on roughly 5% of output tokens costs $4–$6.

The conventional assumption — smaller model means worse results — breaks down here. The advisor compensates precisely where the smaller model struggles.

The June 2026 Update: Capping Advisor Output

The most recent change adds a max_tokens parameter to the advisor tool definition. Before this, Opus could respond with as many tokens as a given reasoning task warranted, which made hard tasks expensive and unpredictable. Setting max_tokens: 2048 — Anthropic’s recommended starting point — reduces mean advisor output by roughly 7x with near-zero quality degradation in testing. The minimum value of 1024 cuts output ~10x but truncates around 10% of calls, which may be acceptable depending on your task.

If you’re running hard reasoning tasks — the kind that genuinely stress the advisor — expect longer output by default. The max_tokens cap makes cost predictable. For a deeper look at how Anthropic designed the advisor strategy and the benchmark methodology, the official post has the full breakdown.

When This Works and When It Doesn’t

The advisor tool fits a specific profile: multi-step agentic tasks with genuine decision points. Coding agents, computer use workflows, and multi-step research pipelines all qualify. In these contexts, a smarter plan mid-task meaningfully changes the final output.

Skip it for single-turn queries — if the user asks the agent to summarize a document and there’s one step, the executor won’t invoke the advisor and you’ve added a tool definition for nothing. Skip it for mechanical tasks like data formatting, regex transformations, or lookups. And skip it for latency-critical, user-facing paths: the advisor sub-inference does not stream. The executor’s stream pauses while Opus runs, then the full advisor response arrives at once. For background agents, this is fine. For a UI where users are watching a cursor, it creates a noticeable gap.

If you’re integrating via a framework, LiteLLM now supports the advisor tool natively, so you don’t need to hand-craft the beta header in your own proxy layer.

Worth Adding?

If you’re building agentic workloads on Claude today, yes. The setup cost is genuinely two lines — a header and a tool definition. The Sonnet + advisor pairing is a quality upgrade at lower cost than all-Opus. The Haiku + advisor pairing is a cost reduction with a larger-than-expected quality jump. The June 2026 max_tokens addition makes it predictable enough for production.

The advisor tool is still in beta, and the stream-pause behavior is a real constraint for interactive applications. But for background agents and batch workloads, this is the most cost-effective path to Opus-level decision-making available in the Claude API today. Check the official advisor tool documentation for the full parameter reference before deploying.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.