Claude Advisor Tool API: Get Opus-Level Intelligence at Sonnet Prices

Digital illustration showing two AI brain nodes connected by glowing blue data streams, representing Claude's advisor tool pattern where Opus guides Haiku

Anthropic's advisor tool API: Haiku executes, Opus advises — all in one API call

Anthropic just shipped a way to run Haiku on your agent loops and pull in Opus only when a decision is actually hard — all inside a single API call, no orchestration code required. The result speaks for itself: Haiku with an Opus advisor scored 41.2% on BrowseComp, more than double Haiku solo’s 19.7%, at 85% less cost than running Opus end-to-end. The tool is in beta. It’s called the advisor tool, and if you’re building production Claude agents, you should care.

The Real Cost of Running Opus Everywhere

Opus is 5x more expensive than Haiku per token and nearly 1.7x more expensive than Sonnet. In a demo, that doesn’t matter. In production, where your agent might run hundreds of turns across thousands of daily tasks, you’re burning money on model intelligence that most of those turns simply don’t need.

The developer workaround until now was manual orchestration: run Sonnet, detect when a decision was too complex, fire a separate Opus call, pass the result back into context, and resume. That approach works, but it adds complexity, requires you to correctly identify escalation points, and introduces an extra network roundtrip for every escalation. Most teams either overkill with Opus or skip the escalation pattern entirely and live with Sonnet’s occasional misses.

The advisor tool collapses that entire pattern into your existing API call.

How It Works

When you add the advisor tool to your tools array, the executor model — Sonnet 4.6 or Haiku 4.5 — runs the task end-to-end as normal. It calls tools, reads results, and iterates. When it encounters a decision it can’t confidently resolve, it emits a server_tool_use block with the advisor tool name. At that point, Anthropic runs a separate inference pass on Opus server-side: Opus receives the complete shared context — your system prompt, all tool definitions, every prior turn and tool result — and returns guidance, typically 400–700 text tokens. The executor then resumes with that guidance in context.

From your perspective as the developer, this is still one /v1/messages call. You don’t manage the context handoff. You don’t coordinate two API calls. The advisor never calls tools and never produces user-facing output — it only advises the executor.

The API Setup

Two changes from your existing Messages API code: add the beta header and declare the advisor tool in your tools array.

import anthropic

client = anthropic.Anthropic()

ADVISOR_TOOL = {
    "type": "advisor_20260301",
    "name": "advisor",
    "model": "claude-opus-4-7",   # the advisor (Opus)
    "max_uses": 3,                 # cap how many times Opus is consulted per request
}

response = client.beta.messages.create(
    model="claude-sonnet-4-6",     # the executor
    tools=[*your_existing_tools, ADVISOR_TOOL],
    messages=messages,
    betas=["advisor-tool-2026-03-01"],  # required beta header
)

That’s the full setup. The executor model handles the rest — it decides when the task warrants consulting Opus, not you. The max_uses parameter caps how many times Opus gets called per request, giving you a ceiling on advisor token spend. Set it between 2 and 4 for most agent workloads.

Do the Benchmarks Hold Up?

Anthropic published numbers. Here’s what the configurations actually look like side by side:

Configuration	Benchmark	Score	Cost vs. Opus solo
Haiku 4.5 solo	BrowseComp	19.7%	~80% less
Haiku 4.5 + Opus advisor	BrowseComp	41.2%	~85% less
Sonnet 4.6 solo	SWE-bench Multilingual	72.1%	~40% less
Sonnet 4.6 + Opus advisor	SWE-bench Multilingual	74.8%	~12% less

The Haiku combination is the interesting one. A 2x improvement in task performance at 85% savings versus Opus is not a marginal win. It means Haiku — the model you’d normally rule out for complex agentic tasks — becomes a legitimate choice for production workloads where most steps are mechanical but some decisions genuinely need intelligence.

Sonnet with an Opus advisor is a tighter trade: you gain 2.7 percentage points on SWE-bench Multilingual and cut cost by 11.9% versus running Opus solo. If you’re currently running Opus on all your coding agent tasks, that’s a straightforward swap worth making.

When to Use It — and When to Skip It

The advisor pattern fits workloads where most turns are mechanical and a minority of decisions are genuinely hard. Good targets: long coding agent loops, security audit pipelines, research agents that fetch and summarize data, code review bots that need to classify risk, and multi-step compliance workflows.

It’s a poor fit for single-turn Q&A (there’s nothing to escalate), pure model routers where users pick their own model, or workloads where every single turn legitimately needs Opus-level reasoning. If you’re calling the advisor on every turn, you’ve lost the cost benefit and added latency for nothing.

Two Gotchas Worth Knowing

Streaming pauses. Advisor output doesn’t stream. When the executor decides to consult Opus, your stream pauses while the sub-inference runs. For latency-sensitive applications, keep max_uses low — 1 or 2 — to minimize interruptions. For batch processing or background agents where latency doesn’t matter, this is a non-issue.

Billing tracking. The top-level usage field in the API response only reflects executor tokens. Advisor tokens appear in the iterations array under entries with type: "advisor_message". If your cost monitoring reads only the top-level usage, you’ll undercount. Update your instrumentation before shipping to production.

This Is the Direction

The advisor tool is in beta and that will change. What won’t change is the underlying pattern: Anthropic is building infrastructure that lets developers access high-intelligence reasoning at low-intelligence prices for the turns that don’t need it. The manual orchestration approach was always the right idea — it was just too brittle to use in practice. This makes it a two-line API change.

Full documentation is in Anthropic’s advisor tool API docs. The official post covering the advisor strategy rationale is worth reading before you ship. For a worked example with a security audit agent, Builder.io published a thorough cost breakdown walkthrough showing the pattern end-to-end. And if you want independent benchmarks with different task types, this developer writeup covers the tradeoffs in depth.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

Claude Advisor Tool API: Get Opus-Level Intelligence at Sonnet Prices

The Real Cost of Running Opus Everywhere

How It Works

The API Setup

Do the Benchmarks Hold Up?

When to Use It — and When to Skip It

Two Gotchas Worth Knowing

This Is the Direction

Google Antigravity 2.0: What Developers Need to Know

WebMCP: Chrome Turns Websites Into AI Agent Tools (2026)

Leave a reply Cancel reply

More in:AI & Development

LLM API Costs Dropped 94%: What to Fix in Your Architecture Now

40% of Agentic AI Projects Will Be Canceled by 2027. Here Is Why Yours Might Be One of Them.

AMD Helios Hits Azure: 72 GPUs, 31 TB HBM4, Rival Nvidia

EU AI Act August 2: What Developers Must Do Now

GPT-5.6 Sol, Terra, and Luna: Developer Guide and Migration

Grok Build Goes Open Source After Secretly Uploading Your Code

Categories

The Real Cost of Running Opus Everywhere

How It Works

The API Setup

Do the Benchmarks Hold Up?

When to Use It — and When to Skip It

Two Gotchas Worth Knowing

This Is the Direction

Share

You may also like

Leave a reply Cancel reply

More in:AI & Development

Categories

Latest Posts