Logo
FrontierNews.ai

Anthropic's Advisor Tool Lets Smaller Claude Models Think Like Opus Without the Price Tag

Anthropic has released an updated Advisor Tool that lets developers run cheaper Claude models for most tasks while calling in the more powerful Opus model only when genuinely difficult decisions arise. The result is a significant cost reduction paired with performance gains that challenge the conventional assumption that bigger models always mean better results.

The problem the Advisor Tool solves is straightforward: Opus costs five times more than Sonnet per million tokens processed, making it prohibitively expensive for sustained agentic work, where AI systems autonomously execute multi-step tasks. Yet Sonnet alone sometimes struggles with complex architectural decisions or debugging impasses that would benefit from Opus-level reasoning. The Advisor Tool bridges this gap by letting Sonnet handle the task end-to-end, automatically invoking Opus only when it hits a decision point it cannot resolve confidently.

How Does the Advisor Tool Actually Work?

The setup is remarkably simple. Developers add a beta header and a tool definition to their API request, specifying Sonnet (or Haiku) as the executor model and Opus as the advisor. When the executor encounters a complex decision, it invokes the advisor tool with the full conversation transcript. Opus produces a strategic plan, typically 400 to 700 tokens, and the executor continues with that guidance. Everything happens within a single API request, with billing split cleanly between executor and advisor rates.

The June 2026 update adds a critical parameter: max_tokens for the advisor output. Before this change, Opus could respond with as many tokens as needed, making hard tasks unpredictable and expensive. Setting max_tokens to 2,048, Anthropic's recommended starting point, reduces mean advisor output by roughly 7 times with near-zero quality loss. The minimum value of 1,024 cuts output by 10 times but may truncate around 10% of calls, depending on task complexity.

What Are the Real-World Performance Gains?

Anthropic published benchmarks across two model pairings that reveal the tool's practical impact. On SWE-bench Multilingual, a coding task benchmark, Sonnet 4.6 paired with an Opus advisor improved from 72.1% accuracy when running solo to 74.8% with advisory support, while total cost per task dropped 11.9% compared to running Opus for everything. This pairing makes sense for teams already routing hard tasks to Opus; the advisor approach delivers better results at lower cost.

The more dramatic gains appear with Haiku 4.5, Anthropic's smallest model. On BrowseComp, a web-browsing task, Haiku's score jumped from 19.7% to 41.2%, more than doubling, while costing 85% less per task than Sonnet alone. A 100,000-token session running entirely on Opus costs $15 to $20; the same session on Haiku with Opus advising on roughly 5% of output tokens costs $4 to $6. For high-volume workloads where cost matters, this pairing offers roughly equivalent quality at a fraction of the price.

When Should You Use the Advisor Tool?

  • Best Use Cases: Multi-step agentic tasks with genuine decision points, including coding agents, computer use workflows, and multi-step research pipelines where a smarter plan mid-task meaningfully changes the final output.
  • Skip for Single-Turn Queries: If a user asks the agent to summarize a document in one step, the executor won't invoke the advisor, making the tool definition unnecessary overhead.
  • Avoid for Latency-Critical Paths: The advisor sub-inference does not stream. The executor's stream pauses while Opus runs, then the full advisor response arrives at once. For background agents this is fine, but for user-facing interfaces where users watch a cursor, it creates a noticeable gap.
  • Skip Mechanical Tasks: Data formatting, regex transformations, and lookups don't benefit from advisory reasoning and add unnecessary complexity.

The Advisor Tool remains in beta, and the stream-pause behavior is a real constraint for interactive applications. However, for background agents and batch workloads, this approach offers the most cost-effective path to Opus-level decision-making available in the Claude API today. Framework support is expanding; LiteLLM now supports the advisor tool natively, eliminating the need to hand-craft the beta header in custom proxy layers.

For teams building agentic workloads on Claude, the setup cost is minimal: just a header and a tool definition. The Sonnet plus advisor pairing delivers a quality upgrade at lower cost than running Opus for everything. The Haiku plus advisor pairing offers substantial cost reduction with a larger-than-expected quality jump. The June 2026 max_tokens addition makes costs predictable enough for production environments, addressing the unpredictability that plagued earlier versions.