Anthropic's Claude Sonnet 5 Closes the Gap With Its Flagship Model at a Fraction of the Cost
Anthropic released Claude Sonnet 5 on Tuesday, positioning it as the company's most capable mid-tier model yet and a cost-effective alternative to its flagship Opus 4.8 for developers building autonomous AI agents. The new model performs nearly as well as Opus 4.8 on complex reasoning, tool use, and coding tasks, but at significantly lower cost, marking a shift in how foundation model companies compete on price-to-performance for agentic applications (Source 1, 2, 3).
Sonnet 5 replaces Sonnet 4.6 as the default model across all Claude plans, including free, Pro, Max, Team, and Enterprise tiers. The model is available immediately through Anthropic's API and web interface. For a limited time, Anthropic is offering introductory pricing at $2 per million input tokens and $10 per million output tokens through August 31, after which pricing will increase to $3 and $15 per million tokens respectively (Source 2, 3).
How Does Sonnet 5 Compare to Anthropic's Flagship Model?
On several key benchmarks, Sonnet 5 demonstrates performance that rivals or approaches Opus 4.8, the company's most powerful model. On agentic coding tasks, Sonnet 5 scored 63.2%, compared to Opus 4.8's 69.2% and the previous Sonnet 4.6's 58.1%. On knowledge work benchmarks, Sonnet 5 actually slightly outperformed Opus 4.8, though Anthropic notes that Opus 4.8 remains the better choice for tasks requiring the highest accuracy levels (Source 1, 3).
The real-world difference shows up in how the model behaves during complex tasks. According to Anthropic's testing, Sonnet 5 now completes multi-step workflows where previous Sonnet versions would stop partway through. Daniel Shepard, a senior engineer at Zapier, described a practical example: "We handed Claude Sonnet 5 a two-part job, update Salesforce account tiers, send a launch announcement to enterprise contacts, and it finished end to end. That used to stall halfway. For day-to-day automation, it's a no-brainer".
What Makes Sonnet 5 "Agentic," and Why Does That Matter?
Agentic capability refers to an AI model's ability to plan and execute multi-step tasks with minimal human intervention. Sonnet 5 can now use tools like web browsers and terminal commands, plan complex workflows, and run autonomously at a level that previously required larger, more expensive models. This shift reflects a broader industry trend: as OpenAI, Google, and other labs release their own agentic models, the differentiator is no longer whether a model can do agentic work, but how cheaply and reliably it can do so without human oversight.
Anthropic also raised rate limits for Chat, Cowork, Claude Code, and API users to accommodate the higher token usage that comes with more complex reasoning tasks. Users can now select effort levels to control how much computational power the model applies to a given task, allowing developers to balance speed and cost against accuracy.
How to Evaluate Sonnet 5 for Your Use Case
- Benchmark Performance: Compare Sonnet 5's scores on your specific task type against Opus 4.8 and GPT-5.5. On coding and reasoning, Sonnet 5 is within 5-10 percentage points of Opus 4.8, making it viable for most production workloads.
- Cost Modeling: Calculate your expected token usage and compare the introductory $2/$10 pricing through August against standard rates of $3/$15 per million tokens. For teams running agentic pipelines at scale, the savings during the migration window could be substantial.
- Effort Level Testing: Use the new effort level controls to test whether lower reasoning settings meet your accuracy requirements. This allows you to optimize for cost without sacrificing quality on simpler tasks.
- Safety Requirements: If your application involves cybersecurity or high-risk tasks, note that Sonnet 5 has lower performance on dangerous tasks than Opus 4.8 and Claude Mythos Preview, but still includes safeguards against misuse and deception.
What About Safety and Misuse Risks?
Anthropic deliberately did not train Sonnet 5 on cybersecurity tasks, and the model's performance on dangerous security-related benchmarks lags significantly behind Opus 4.8. When tested on finding exploits in Firefox 147, for example, Sonnet 5 was unable to develop a fully working exploit, though it showed a slightly higher rate of partial success than Sonnet 4.6.
On other safety dimensions, Sonnet 5 improves over its predecessor. It demonstrates lower rates of undesirable behaviors such as cooperation with misuse, deception, hallucination, and sycophantic responses compared to Sonnet 4.6. It also refuses malicious requests more consistently and resists prompt-injection attacks more effectively.
"A model that knows when to say no is just as important as one that knows how to build," said Fabian Hedin, co-founder of Lovable, noting that Claude Sonnet 5 "refuses unsafe requests cleanly and consistently."
Fabian Hedin, Co-founder at Lovable
Why Is Pricing Strategy Important Right Now?
The introductory pricing through August 31 represents Anthropic's first formal promotional pricing for a Sonnet model launch. A company spokesperson explained the rationale: "We want our customers to test Sonnet 5 against their real workloads at the lowest possible cost during the migration window". This approach directly addresses a friction point developers have faced with Sonnet 4.6, where the decision between using a cheaper mid-tier model or paying more for Opus 4.8 was often unclear.
The timing also reflects competitive pressure. OpenAI launched three GPT-5.6 models in June, including Sol, which is also positioned as an agentic mid-tier option. Google's Gemini 3.5 Flash, released in May, similarly emphasizes autonomous task execution at lower cost. By offering Sonnet 5 at $2 per million input tokens through August, Anthropic is giving developers a window to evaluate whether the model meets their needs before standard pricing takes effect.
For teams that have been running GPT-4o or similar mid-tier models for coding agents, Sonnet 5's pricing is competitive without requiring a model-shopping exercise. The combination of improved agentic performance, lower cost, and expanded rate limits positions Sonnet 5 as a practical default for developers building autonomous workflows at scale (Source 2, 3).