
Hermes Agent shipped asynchronous subagent delegation on June 16. If you’ve been using delegate_task in any production capacity, this matters: spawning child agents no longer freezes your parent conversation. The fix is a single command — hermes update — and what it unlocks is more significant than the release notes suggest.
The Blocking Problem Nobody Was Fixing
Multi-agent AI workflows have a well-known failure mode that most frameworks politely ignore: synchronous delegation creates a traffic jam. When a parent agent spawns a child, it blocks — sitting in a tool call, accumulating the child’s entire intermediate reasoning, tool invocations, and output into its own context window. Do this with three or four subagents in sequence and you’re burning tokens on context you never needed, slowing every downstream response, and setting yourself up for the kind of context rot that causes 65% of enterprise AI agent failures in production.
Hermes had the same problem. delegate_task was synchronous by design, which meant the parent waited. Until now.
What Async Subagents Actually Change
The new async model adds a full lifecycle API: spawn, check, steer, collect, cancel, and list. The parent can fire off subagents and keep working. Results arrive when the subagents are done. Under the hood, each child runs as an in-process thread with the same credentials and toolset as delegate_task, with output captured to an in-memory ring buffer.
Two modes are available. Fire-and-forget is the default: spawn a subagent, let it run, collect results later. Blocking mode remains available when you need ordered completion — the parent waits until all subagents finish before continuing.
The performance improvement is concrete. In Nous Research’s load test with 10 parallel requests, async delegation completed in under 1.2 seconds — a 4.8x throughput improvement over sequential execution. For long-running pipelines, that’s not a nice-to-have; it’s the difference between a workflow that completes in a reasonable timeframe and one that doesn’t.
Context Isolation: The Part Worth Paying Attention To
The performance gain is the headline, but the architectural change is more interesting. Each subagent now operates in complete isolation: its own conversation context, its own terminal session, and a restricted toolset defined at spawn time. The child agent has zero knowledge of the parent’s conversation history. Its entire context is the goal and context fields the parent populates at delegation time. Only the final summary returns to the parent.
This is the right default. It means the parent’s context window stays clean regardless of how complex the child’s work gets. A subagent can spend 2,000 tokens reasoning through a problem, calling a dozen tools, and exploring dead ends — none of that lands in the parent’s context. The parent sees a summary. That’s it.
Concurrent execution caps at three subagents by default, though this is configurable without a hard ceiling. File coordination, added in v0.11.0 earlier this year, prevents concurrent sibling agents from overwriting each other’s filesystem edits — a detail that matters when agents are actually doing work rather than just reasoning about it.
How to Use It
Update with hermes update. The async tools are available immediately. A practical fan-out pattern looks like this:
# Spawn three research agents in parallel — parent does not wait
spawn("Research Python async patterns", context=project_context)
spawn("Research Go concurrency models", context=project_context)
spawn("Research Rust tokio ecosystem", context=project_context)
# Parent continues other work here
# When ready, collect all three summaries
results = collect()
This pattern makes sense for any workflow where subtasks are independent: parallel code reviews (one agent per module), A/B content generation, multi-source research aggregation, or long-running background data processing that should not interrupt the main session. The official subagent docs cover additional patterns including the steer API, which lets you inject guidance into a running subagent mid-task.
Where Hermes Fits in Your Stack
The clearest framing for Hermes versus the other options: Claude Code handles deep, focused coding work; Hermes handles background automation and multi-step pipelines. OpenClaw is the consumer productivity play; Hermes is the developer runtime. Async subagents make that distinction cleaner — you now have a credible answer for anyone who asks how to run parallel agent workflows without the context overhead that made earlier approaches impractical.
Context engineering — getting the right information into the right context at the right time — has become the defining production challenge for AI systems in 2026. Hermes Agent async subagents with hard context isolation are a direct, practical response to that problem. The blocking version was a reasonable first step. The async version is what production workloads actually need.













