Anthropic's New 'Dreaming' Feature Lets AI Agents Learn From Their Mistakes Between Jobs
Anthropic has introduced a new capability called 'dreaming' that allows Claude agents to review their own work between sessions and extract lessons for future tasks, addressing a fundamental limitation in how AI agents currently operate. The feature, unveiled at Code with Claude 2026 in San Francisco on May 6, mimics how human brains consolidate memories during sleep. Legal AI startup Harvey, which piloted the technology, reported that task completion rates climbed roughly 6 times after dreaming was enabled.
How Does Dreaming Actually Work for AI Agents?
Dreaming does not retrain Claude's underlying model. Instead, it functions as a structured note-taking system that runs between agent sessions. When enabled, Anthropic schedules a background process that periodically reads through an agent's recent work and its memory store, looking for three specific patterns: recurring mistakes the agent keeps making, workflows the agent converges on across different jobs, and preferences that have emerged across a team of agents.
The system then rewrites the agent's memory, condensing what has become stale information and promoting what has become essential. Developers can choose to let dreaming update memory automatically or require human review before changes take effect. This approach directly addresses a classic failure mode in long-running agent work, where every new session starts fresh and the agent must relearn the same quirks repeatedly.
Harvey's experience illustrates the practical impact. Before dreaming, the legal AI startup's agents kept forgetting file type quirks and tool-specific workarounds between sessions, causing the same legal-drafting jobs to fail in identical ways over and over. With dreaming enabled, those workarounds stuck, and task completion rates improved dramatically.
What Other Agent Features Did Anthropic Release Alongside Dreaming?
Anthropic shipped three Managed Agents features on the same day. Dreaming entered research preview, requiring developers to request access, while the other two moved into public beta.
- Outcomes: A self-grading loop where a separate evaluator scores an agent's output against a written rubric and tells the agent exactly what to fix. Anthropic's internal benchmarks claim outcomes improved task success by up to 10 percentage points over standard prompting, with file-generation quality rising 8.4% on.docx outputs and 10.1% on.pptx outputs.
- Multiagent Orchestration: Allows a lead agent to break complex jobs into chunks and hand each to specialist subagents with their own models, prompts, and tools. The specialists run in parallel on a shared filesystem and remain individually traceable in the Claude Console.
- Memory Consolidation: Dreaming refines agent memory between sessions, pulling shared learnings across agents and keeping information up-to-date, addressing the core bottleneck Anthropic identified in autonomous agent work.
Netflix's platform team is already using multiagent orchestration to process build logs from hundreds of source repositories in parallel. Spiral, a writing tool built by the publication Every, uses a similar structure with a Haiku-based lead agent delegating drafting to Opus-based subagents that run side by side. Wisedocs, a document-review startup, reported that reviews now run 50% faster since adopting outcomes for grading.
Why Does Memory Matter More Than Raw Capability for AI Agents?
Anthropic's Chief Product Officer Ami Vora framed the company's strategic focus clearly at the conference: "Memory lets each agent capture what it learns as it works. Dreaming refines that memory between sessions, pulling shared learnings across agents and keeping it up-to-date". The through-line across all three announcements reveals Anthropic's core insight, that the bottleneck preventing agents from working autonomously without human oversight is not raw intelligence but the ability to retain and apply lessons over time.
The published evidence so far is limited to customer testimonials. Anthropic released no independent benchmark with the launch, so practitioners will want to know whether Harvey's 6x improvement is representative of broader agent workflows or specific to long-form legal-drafting tasks where the company already had a clear pre-dreaming failure mode.
How to Implement Agent Memory Systems in Your Workflow
- Enable Dreaming for Long-Running Tasks: Activate dreaming on Managed Agents handling jobs that span multiple sessions, days, or operators, where the same agent or fleet of agents keeps returning to the same problem space. This is where the feature delivers the most value.
- Set Up Outcomes Rubrics: Write plain-language rubrics describing what successful output looks like for your specific use case. Wire outcomes to webhooks so agents run, get graded, and notify you only when output meets your quality bar.
- Architect Multiagent Workflows: Design complex jobs as a lead agent that breaks work into chunks and delegates to specialist subagents running in parallel. This pattern works best for batch processing, analysis, and content generation tasks.
The conference that launched these features was the first of three stops on Anthropic's developer-conference tour, with London following on May 19 and Tokyo on June 10. The timing reflects Anthropic's push to help enterprise customers build agents that can operate with less human supervision, a shift that requires solving the memory problem first.
Anthropic's move also comes as the broader AI industry grapples with rising compute costs. The company has been experimenting with tiered services for Claude Code users, highlighting the industry's struggle to balance supply and demand while maintaining profitability. By enabling agents to learn and improve over time, Anthropic is positioning Claude as a tool that becomes more efficient the longer it operates, potentially reducing the computational overhead per task.