
Anthropic just shipped something that breaks the usual AI API trade-off. The advisor tool, now in public beta, lets you run Sonnet or Haiku as your executor while Opus steps in as a strategic advisor — server-side, mid-generation, in a single API call. The result in early benchmarks: Sonnet with an Opus advisor scores 74.8% on SWE-bench Multilingual, up from 72.1% running alone, while cutting cost per agentic task by 11.9% compared to Opus solo. Higher quality. Lower cost. The trade-off doesn’t apply here.
The Problem This Solves
Running Opus end-to-end on long agentic tasks is expensive. Most of the work in a typical coding agent — parsing tool output, calling APIs, writing boilerplate, formatting responses — doesn’t need Opus-level reasoning. It needs throughput. But Sonnet alone sometimes misses the strategy at critical decision points: where to refactor, how to structure a multi-step plan, when to stop and reconsider. Developers have worked around this by manually routing between models — extra API calls, separate orchestration layers, context syncing between requests. It’s messy and adds latency.
The advisor tool collapses that pattern into the executor’s own context. When the executor hits a decision it cannot confidently resolve, it calls the advisor server-side. Opus reads the full transcript, returns a short plan or correction (typically 400 to 700 text tokens), and the executor continues. No round trips. No orchestration code. One API call. Learn more in the official advisor tool documentation.
How to Add It to Your Agent
The implementation is minimal. Add the beta header, include the advisor tool in your tools array, and swap messages.create for beta.messages.create:
import anthropic
client = anthropic.Anthropic()
response = client.beta.messages.create(
model="claude-sonnet-4-6", # executor
max_tokens=4096,
betas=["advisor-tool-2026-03-01"],
tools=[
{
"type": "advisor_20260301",
"name": "advisor",
"model": "claude-opus-4-6", # advisor
"max_uses": 3 # cap advisor calls per request
}
],
messages=[{"role": "user", "content": "YOUR_TASK_HERE"}]
)
Three supported model pairs: Haiku 4.5 + Opus 4.6, Sonnet 4.6 + Opus 4.6, or Opus 4.6 + Opus 4.6. The max_uses parameter caps advisor invocations per request. Advisor tokens appear separately in the usage block as advisor_input_tokens and advisor_output_tokens, so you can track Opus spend independently. Read Anthropic’s full breakdown in The Advisor Strategy blog post.
The Numbers You Can Quote in a Standup
On SWE-bench Multilingual — a benchmark for agentic coding across languages — Sonnet with an Opus advisor hits 74.8%, compared to 72.1% for Sonnet solo and approximately 76% for Opus solo. That 2.7 percentage point improvement comes at 11.9% lower cost than running Opus end-to-end. For most teams, this is strictly better than either solo option for agentic workloads.
The Haiku story is even more dramatic. Haiku with an Opus advisor scores 41.2% on BrowseComp — more than double Haiku’s solo score of 19.7%. It costs 85% less per task than Sonnet solo, with a 29% score gap. If you’re running a high-volume pipeline where Sonnet is overkill for most steps but you occasionally need strategic judgment, Haiku plus advisor is worth a serious look.
When to Use It (and When Not To)
This pattern fits long-horizon agentic workloads where most steps are mechanical but strategic decisions matter. Coding agents, multi-step research pipelines, and computer use are the canonical use cases. Add it when your executor is solid on execution but occasionally makes poor strategic choices mid-task.
Skip it for single-turn completions, simple Q&A, or tasks where every step genuinely needs Opus reasoning. If the task is complex enough that Opus should be driving the whole thing, just use Opus. The advisor pattern is for tasks with high variance in cognitive demand across steps — where most decisions are easy but a few are genuinely hard. The builder.io advisor pattern guide covers more real-world scenarios worth reading.
Cost Mechanics
Executor tokens are billed at executor rates. Advisor tokens are billed at Opus rates. Since the advisor only generates a short plan — not the full output — the Opus cost per request stays low. At current Claude API pricing, Sonnet runs at $3/$15 per million tokens (input/output) and Opus at $5/$25. A typical advisor invocation adds 1,400 to 1,800 tokens at Opus rates — roughly $0.01 to $0.02 per consultation — while the full response generates at Sonnet cost. Set max_uses: 3 on a long task and your Opus bill is capped at a few cents regardless of response length.
The advisor tool is in public beta, which means the API surface may change. Given the community traction this pattern has already built — builder.io, daily.dev, LiteLLM, and Roo Code all have active discussions — it’s reasonable to expect it graduates to stable API within the year. If you’re already running Claude Sonnet on agentic tasks, adding the advisor tool is a single configuration change. Test it on your existing benchmark, check the advisor_input_tokens count to see how often Opus actually gets invoked, and let the data tell you whether it fits your workload.













