Claude's New 'Dreaming' Feature Lets AI Agents Remember and Learn Between Sessions
Anthropic has released three new features for Claude Managed Agents that fundamentally change how AI agents behave in production environments: they can now remember lessons from past sessions, verify their own work against quality standards, and break complex jobs into parallel workstreams. The features, shipped on May 6, address what has been the largest gap in production agent infrastructure: agents that actually improve over time rather than repeating the same mistakes indefinitely.
What Is 'Dreaming' and Why Do Production Agents Need It?
Production AI agents have a well-known failure mode: they forget. An agent learns a workaround in one session and has no memory of it in the next, so the same job fails the same way, indefinitely. Dreaming is not a metaphor; it is a scheduled background process that reads an agent's memory store alongside transcripts of its past sessions, up to 100 at a time, and produces a consolidated, reorganized memory store.
The problem Dreaming solves is mundane but real. Agents write to memory incrementally during sessions, and over time those stores accumulate noise: stale entries, conflicting facts, repeated observations. No single session can clean this up because no single session sees the full history. Dreaming runs asynchronously, outside any active session, and does the consolidation work that agents cannot do themselves.
Crucially, the input memory store is never modified. The output is a new store that can be reviewed and discarded before it lands. Harvey, a legal AI company, reported roughly a 6x increase in task completion rates after deploying Dreaming, primarily because their agents stopped forgetting the same filetype quirks and tool workarounds session after session.
How Do Outcomes and Multiagent Orchestration Improve Agent Reliability?
Anthropic shipped two additional features alongside Dreaming to address different production challenges. Outcomes is a rubric-based self-evaluation loop where a developer writes a rubric defining what success looks like for a given task and attaches it to a session. When the agent produces output, a separate grader evaluates it against the rubric in its own isolated context window. If the output does not meet the rubric, the grader specifies what needs to change, and the agent revises. The loop runs up to a developer-configured maximum number of iterations.
The critical design detail is the isolation. The grader cannot see the agent's reasoning history; it only sees the output and the rubric. This separation is what makes the feedback credible: the agent cannot produce a plausible-sounding self-justification and move on. Anthropic's internal benchmarks show 8.4% improvement in.docx quality and 10.1% in.pptx quality compared to standard prompting loops. Wisedocs, a medical document review company, cut review time by 50% while keeping output aligned with their internal quality standards.
Multiagent Orchestration lets a lead agent break large or varied jobs into pieces and hand each piece to a specialist. Up to 20 subagents can run across up to 25 parallel threads. All agents share the same container filesystem; each runs in its own isolated context window, so one agent's reasoning does not bleed into another's. Netflix's platform team uses this to process build logs from hundreds of pipelines simultaneously, something that would be impractically slow as a sequential operation.
Steps to Deploy Claude Managed Agents in Production
- Set Up Agent Environment: Create a Claude agent using the managed-agents-2026-04-01 beta header, which the SDK sets automatically. The official quickstart walks through agent creation, environment setup, and session streaming in under 50 lines of Python.
- Define Quality Rubrics: For tasks where output quality matters, write explicit rubrics defining success criteria and attach them to sessions. Outcomes is in public beta with no waitlist, making it immediately available for teams building production systems.
- Enable Dreaming for Memory Consolidation: Schedule background Dreaming processes to consolidate agent memory stores and surface patterns across multiple sessions. Dreaming is currently in research preview and supports claude-opus-4-7 and claude-sonnet-4-6 models.
- Implement Multiagent Orchestration for Complex Jobs: When a single agent cannot handle the full scope of work, use multiagent orchestration to decompose tasks into parallel workstreams. Full observability is available through the Claude Console, showing which agent did what, in which order, and why.
Why Does This Matter for Teams Moving Agents Into Production?
Individual agent improvements, such as better prompting, tool use, or longer context windows, change what an agent can do in a single session. Dreaming, Outcomes, and Multiagent Orchestration are different in kind: they address how agents behave across sessions, over time, at scale. That is the gap between a demo and a production system.
Agents that improve between runs, that verify their own outputs against a rubric, and that decompose complex jobs into parallel workstreams are not just more capable; they are fundamentally more trustworthy. For teams moving agents out of prototypes and into real systems, that is the distinction that matters. All Managed Agents features are available on the Claude Platform API, with Outcomes and Multiagent Orchestration in public beta requiring no access request, Memory in public beta, Webhooks in public beta, and Dreaming in research preview with limited access.
The release signals a maturation in how Anthropic thinks about production AI systems. Rather than focusing solely on making individual models more capable within a single conversation, the company is now building infrastructure that lets agents learn, verify, and coordinate across time and scale. For enterprises deploying AI agents in mission-critical roles, these features represent a meaningful step toward systems that improve rather than degrade over time.