Why Anthropic's New Agent Infrastructure Matters More Than Better AI Models
Anthropic is no longer just selling access to Claude; the company is building the unglamorous infrastructure that makes AI agents actually work in production environments. At its Code with Claude event in May 2026, the company unveiled Managed Agents, a system designed to solve problems that have plagued every serious agent deployment: sessions that disappear when things break, tool calls that fail silently, and workflows that collapse mid-task with no way to resume.
What's Actually Broken About Today's AI Agents?
When AI researchers demo agents at conferences, they show ten-minute clips of impressive work. In production, agents often fail within hours. The gap isn't about model intelligence; it's about infrastructure. A production agent needs far more than a smart language model. It needs persistent sessions that survive failures, reliable tool execution, sandboxed environments for code and file operations, readable memory systems, bounded permissions, retry logic, human review points, and detailed logs of what actually happened.
Most teams building serious agent workflows end up rebuilding all of this from scratch, usually during a crisis while shipping something else. Anthropic's engineering team identified a specific problem that illustrates the issue: Claude Sonnet 4.5 would prematurely wrap up tasks as its context window filled, a behavior they called "context anxiety." They added context resets to compensate. When they tested the same setup on Claude Opus 4.5, that behavior vanished, and the resets became useless code.
How Does Anthropic's New Managed Agents System Actually Work?
Rather than building agents as single fragile loops, Anthropic virtualized three core components: the session (an append-only log of everything that happened), the agent loop (the process that calls Claude and routes tool calls), and the sandbox (where Claude runs code and edits files). By decoupling these three pieces, any one can fail, get replaced, or scale independently without crashing the others.
The practical difference is significant. In older designs, if a container crashed, the entire session disappeared. If the container became unresponsive, engineers had to debug inside it, but that often meant accessing user data, making real troubleshooting impossible. With Managed Agents, when the agent loop fails, a new one boots up, fetches the session log, and resumes from the last recorded event. The session survives because it lives outside the loop.
Steps to Building Agents That Actually Survive Production
- Design for Resumability: Build agent workflows assuming they will fail and need to restart from a known state, not as single continuous runs that must succeed on the first attempt.
- Write Outcome Rubrics Before Deployment: Define what success looks like in measurable terms before the agent runs, so the system knows when to stop, retry, or escalate without human intervention.
- Separate Execution from Evaluation: Use a cheaper, faster model to execute tasks end-to-end while reserving stronger models for decision points and quality checks, reducing costs and improving reliability.
- Implement Bounded Permissions: Ensure agents can only access and modify the specific data and tools they need, with auditable logs of every action taken.
- Plan for Context Window Management: Architect workflows to handle context resets and session persistence as the agent processes longer tasks.
The Outcomes feature is particularly important. Most agent implementations treat success as implicit; the agent runs until it stops, and a human decides if the result is good. At scale, this fails. An agent running for an hour might produce something subtly wrong with no one watching. Outcomes let developers describe success with a rubric, and a separate grader evaluates the agent's output in its own context window. The agent can retry based on what the grader reports as wrong.
This forces teams to specify what "done" actually means. For lead qualification, that means defining what score threshold plus what data requirements constitute a qualified lead. For proposal preparation, it means identifying mandatory sections and quality bars for each. For content research, it means specifying what constitutes a complete source set. Most teams skip this because writing good acceptance criteria is genuinely difficult, but skipping it means there's no systematic way to know when to stop or when to escalate.
Why Are Anthropic's Compute Announcements Actually Significant?
Alongside the Managed Agents announcement, Anthropic made infrastructure changes that directly address production constraints. Claude Code five-hour rate limits doubled for Pro, Max, Team, and seat-based Enterprise plans. The peak hours limit reduction on Claude Code for Pro and Max was eliminated. Opus API rate limits increased considerably. Within the month, Anthropic expects access to over 300 megawatts of compute via SpaceX Colossus 1, translating to more than 220,000 NVIDIA GPUs.
These sound like backend engineering announcements, but they're actually product decisions that define what workflows are possible. Operators building long-horizon agent workflows felt the constraint directly over recent months: tighter limits, inconsistent behavior under load, and disruption to third-party agent setups. Teams designed workflows around conservative time and call budgets, cutting scope on exactly the tasks agents are most useful for. A doubled five-hour limit isn't abstract if you're running a multi-repo refactoring job with end-to-end testing and documentation generation on a real codebase. An agent that gets cut off mid-task and can't resume cleanly isn't a product; it's a prototype with a time bomb.
The compute announcement is a direct response to that constraint, not speculative roadmap planning. Anthropic is explicitly connecting infrastructure capacity to the promise that agents can do long work reliably. If your workflow architecture has been designed around conservative limits, you now have room to reconsider scope. But only agents built for resumability will benefit. Raw runtime increase doesn't help if your session design assumes the run will always succeed in one shot.
What's the Advisor Strategy, and Why Should You Care?
Of everything covered at Code with Claude, the advisor strategy is the pattern worth adopting immediately. Not because it's technically novel, but because it maps directly onto workflows that already exist. The pattern works like this: a cheaper, faster model acts as executor, running the task end-to-end, calling tools, reading results, iterating, and producing output. A stronger model acts as advisor only at specific decision points, seeing the shared context and returning guidance, corrections, or stop signals.
The instinct when building agent workflows is to use the most capable model for everything. The advisor strategy flips that. It reserves expensive, powerful models for decisions that actually need them, while letting faster models handle execution. This reduces costs, improves speed, and often produces better results because the advisor can focus entirely on quality control rather than task execution.
Anthropic's shift from selling model access to productizing agent infrastructure represents a maturation in how AI companies think about deployment. The gap between a ten-minute demo and a production system running for hours isn't a model problem; it's an engineering problem. By addressing session persistence, tool reliability, outcome evaluation, and compute capacity, Anthropic is acknowledging that the next phase of AI adoption depends less on raw intelligence and more on systems that actually work when things go wrong.