Your agent’s real bottleneck isn’t the model — it’s the loop. Standard tool-calling agents work in rounds: the model picks a tool, calls it, gets a result, then picks the next one. Five tools means five separate round-trips to the model API, five full context windows re-sent, five chances for things to go wrong. Microsoft’s CodeAct, shipped as part of the Microsoft Agent Framework at Build 2026, collapses that loop. The model writes a Python program. A sandboxed VM runs it once. Done. Microsoft’s benchmarks: 52% latency reduction, 64% fewer tokens.
Why the Loop Is Expensive
The cost of standard tool calling isn’t obvious until you trace it. Every time an agent decides to call a tool, it sends the entire conversation context — system prompt, message history, tool definitions — to the model API. That’s the same overhead whether the model is kicking off a 10-second data fetch or concatenating two strings. A five-step agent plan has five of these round-trips, each burning latency and tokens.
CodeAct changes the unit of work. Instead of asking “what’s the next tool to call?”, the model is asked to write a short Python program that handles the entire plan. The program calls your tools via a call_tool() function, handles intermediate data directly in code, and returns a final result in one execution pass. Five model turns become one.
The Hyperlight Sandbox: Security Isn’t an Afterthought
The obvious objection to running model-generated code is safety. What stops the model from writing something destructive? The answer is Hyperlight, Microsoft’s open-source micro-VM project (CNCF Sandbox since February 2025).
Each execute_code call spins up a fresh, hardware-isolated micro-VM with a cold start under two milliseconds — fast enough that isolation adds no meaningful latency. Inside the VM:
- No access to the host filesystem beyond paths you explicitly mount
- No network access beyond domains you explicitly allow
- If the guest crashes or misbehaves, it hits a hardware wall — the host is untouched
This is VM-level isolation, not process sandboxing. Hyperlight was built to safely run untrusted code inside Microsoft’s own infrastructure — this threat model is exactly what it was designed for.
Getting Started
CodeAct ships in agent-framework-hyperlight, currently in alpha. Linux and Windows are supported today; macOS support is on the way.
pip install agent-framework-hyperlight --pre
# or with uv:
uv add --prerelease=allow agent-framework-hyperlight
Two usage patterns are available. The Provider pattern (HyperlightCodeActProvider) is the recommended default for production: use it when your tool registry may change between runs, or when multiple agents share resources. Manual static wiring is simpler — pass execute_code alongside your fixed tool list directly — and works well when the tool set doesn’t change.
from agent_framework_hyperlight import HyperlightCodeActProvider
from agent_framework import AssistantAgent
# Wrap your tools in the provider
provider = HyperlightCodeActProvider(
tools=[fetch_data, filter_records, compute_summary, format_output]
)
# The agent gets one tool: execute_code
# Model writes Python; Hyperlight VM runs it once
agent = AssistantAgent("analyst", model_client=model, tools=[provider])
result = await agent.run("Analyze Q1 sales, compute regional averages, return markdown table")
Full examples are in the Microsoft Learn CodeAct documentation and the framework’s GitHub samples directory.
When to Use CodeAct — and When Not To
CodeAct is a clear win for agents that chain three or more tools per task, especially when intermediate data flows between steps — filtering the output of one tool before passing it to another, for example. Data pipelines, report generation, chained lookups: all strong fits.
Two cases where you should skip it: if you need per-operation human approval (CodeAct approves the whole program as a unit, not individual calls), and if you’re on .NET (Python-first right now, .NET support is planned). Single-tool tasks don’t benefit enough to justify the added dependency.
One caveat worth naming directly: this is alpha. The official CodeAct announcement is explicit about that. The community GitHub discussion shows active feedback on memory limits, streaming, and approval granularity. Use it to evaluate and prototype; avoid building a hard production dependency on the current API.
The Broader Picture
CodeAct is one piece of the “agent harness” story Microsoft introduced at Build 2026 — a set of first-class primitives including shell access, code execution, and human-in-the-loop approvals designed to move agents from prototype-grade to production-ready. The CodeAct + Hyperlight combination is the code execution slice of that stack.
The underlying idea — that giving the model a sandbox is more efficient than giving it a tool list — isn’t specific to Microsoft. LlamaIndex, LangChain, and HuggingFace’s smolagents all have CodeAct implementations. What MAF adds is Hyperlight: VM-level isolation with millisecond startup, which eliminates the safety tradeoff that made code execution impractical for most production agents. If your agents spend most of their time chaining tools, this is worth evaluating.













