
Anthropic shipped three new capabilities for Claude Managed Agents on May 6 that address a problem nobody talks about enough: agents that work in demos but degrade in production. The headline feature, Dreaming, lets agents review their own past sessions, consolidate memory, and surface patterns they couldn’t see in any single run. Harvey, the legal AI platform, enabled it on their document agents and reported roughly a 6x jump in task completion. Two more features arrived in public beta the same day — Outcomes and Multiagent Orchestration — and together they make a compelling case that managed agents are finally ready for serious production workloads.
Dreaming: Agents That Get Better Between Sessions
Memory stores are the long-running state for Managed Agents. The problem is that agents write to them incrementally: one session adds a preference note, the next contradicts it, three sessions later the store is full of stale entries and duplicates. Dreaming is an async cleanup and synthesis job that fixes this automatically.
You give it a memory store and up to 100 past session transcripts. It reads them, produces a new reorganized store — the input is never modified — merges duplicates, drops stale entries, and surfaces patterns that span sessions: recurring mistakes, workflows agents converge on, team preferences that showed up across different users. The output store is an ordinary store you can review and either attach to future sessions or discard.
The Python call is straightforward:
dream = client.beta.dreams.create(
inputs=[
{"type": "memory_store", "memory_store_id": store_id},
{"type": "sessions", "session_ids": [session_a, session_b]},
],
model="claude-opus-4-8",
instructions="Focus on coding-style preferences; ignore one-off debugging notes.",
)
# Poll until dream.status == "completed", then:
output_store_id = next(
o.memory_store_id for o in dream.outputs if o.type == "memory_store"
)
One nuance worth getting right: the instructions field steers synthesis at a high level. “Focus on coding-style preferences” is useful. “Change sentence X to Y” produces no change. Dreams are a synthesis pipeline, not a text editor — for targeted edits to individual memories, use the Memory Stores API directly on the output store afterward. The API requires two beta headers: managed-agents-2026-04-01 and dreaming-2026-04-21, though the SDK sets both automatically.
Dreaming currently supports claude-opus-4-8, claude-opus-4-7, and claude-sonnet-4-6. It is in research preview with a waitlist. Apply for access at claude.com/form/claude-managed-agents.
Outcomes: Write a Rubric, Get Reproducible Quality
The standard agent loop is: prompt in, output out, developer stares at the output and guesses whether it’s acceptable. Outcomes replaces the guessing with a grader.
You write a rubric — a markdown document listing explicit, gradeable criteria — and send it to the session via a user.define_outcome event. A separate grader evaluates the agent’s output in its own context window (isolated from the agent’s reasoning to avoid bias), returns a pass/fail breakdown with an explanation, and the agent revises. This loop continues until the rubric is satisfied or max_iterations is reached (default 3, max 20).
Anthropic’s internal testing showed a 10 percentage point improvement in overall task success, with file generation jumping 8.4% for docx and 10.1% for pptx. The gains come from the agent knowing exactly which criteria it failed and having a defined path to fix them.
Rubric quality determines how well this works. “The data looks good” produces a noisy grader. “The CSV contains a price column with numeric values in every row” is gradeable. A practical shortcut: feed Claude a known-good example of your expected output and ask it to analyze what makes it good, then turn that analysis into rubric criteria. The result is usually better than writing one from scratch. Outcomes are available in public beta now via the Claude Platform API.
Multiagent Orchestration: One Session, Multiple Specialists
When a task is too broad for a single agent to handle well, Multiagent Orchestration lets a coordinator agent break it into pieces and hand each to a specialist with its own model, system prompt, and tool access. Specialists run in parallel on a shared filesystem, each with an isolated context window and persistent thread history.
Netflix’s platform team uses this pattern to analyze build logs: hundreds of builds are processed in parallel batches by specialist agents, and the coordinator surfaces only the recurring patterns worth acting on. The alternative — a single agent doing sequential analysis — is both slower and more likely to lose signal in the noise.
Setup requires one configuration change on the coordinator agent: set multiagent.type to "coordinator" and list the specialist agents in the roster. The platform supports up to 20 unique agents and 25 concurrent threads, with one delegation level. The Claude Console shows full delegation and execution order for every session, which makes debugging multi-agent flows significantly less painful.
Where to Start
If you are already running Managed Agents in production, Outcomes is the highest-leverage starting point. It requires no waitlist, is measurable from the first deployment, and the rubric-based approach fits naturally into any task with defined success criteria. Define a rubric for your most critical agent task, set max_iterations to 3, and compare success rates before and after.
For Dreaming, build your memory architecture now using the Memory API — already in public beta — and apply for research preview access. Treat Dreaming as the upgrade layer: get your memory structure solid first, add Dreaming once you have access.
Multiagent Orchestration makes sense when you have a task that naturally parallelizes or requires domain expertise that is hard to pack into a single system prompt. Do not adopt it because it is new — adopt it when your single-agent solution hits a ceiling. Full documentation is at platform.claude.com/docs/en/managed-agents/overview and the official announcement has the full benchmark data.













