
Your AI coding agent is failing in production, and the model isn’t the problem. The problem is handing it 40 tools and saying “figure it out.” A new open-source tool called Statewright, which surfaced on Hacker News this week, applies a decades-old fix: state machines. Restrict what the agent can do based on where it is in the workflow. On a 5-task SWE-bench subset, two local models went from passing 2 out of 10 attempts to 10 out of 10 — with no model changes, no fine-tuning, no bigger hardware.
The Problem With Giving Agents Every Tool
Agentic coding sessions have a predictable failure pattern. The model re-reads the same files three times, calls an edit command during a review phase, or skips tests and heads straight for deployment. These aren’t hallucinations in the traditional sense — they’re workflow failures. The model is capable of doing the right thing; it just isn’t forced to do things in the right order.
This matters more now that GitHub Copilot has switched to token-based billing. Agentic sessions that spin their wheels burn $30–40 of compute per session. Inefficiency has a price tag attached.
How Statewright Works
Statewright is a state machine engine written in Rust — deterministic, no LLM in the enforcement loop — that integrates with AI coding agents via MCP (Model Context Protocol). You define a workflow as a series of states. Each state declares which tools the agent is allowed to call. The MCP gateway enforces this at the protocol level.
When the agent tries to call a tool outside its current phase, it gets rejected outright:
Tool 'Edit' is not available in the 'planning' phase.
Allowed tools: Read, Grep, Glob.
To advance, call statewright_transition with: READY -> implementing
This isn’t a system-prompt suggestion the model can reason its way around. It’s a hard block. The model is told what it can do and how to move forward — which is more useful than a flat refusal.
Defining a Workflow
Workflows are defined in YAML. A basic three-phase coding workflow looks like this:
states:
planning:
allowed_tools: [Read, Grep, Glob]
max_iterations: 10
instructions: "Read and understand the code. Do not edit."
on:
READY: implementing
implementing:
allowed_tools: [Read, Edit, Write]
max_edit_lines: 20
on:
DONE: testing
FAIL: planning
testing:
allowed_tools: [Bash]
bash_restrictions: [no_destructive, no_network]
on:
PASS: done
FAIL: implementing
The planning state only exposes read-only tools. The model physically cannot edit a file while planning, regardless of what it decides. When it signals readiness via statewright_transition, it advances to implementing, where edit tools unlock. Testing locks down to Bash only, with destructive operations and network access blocked even then.
Statewright also ships a visual editor at statewright.ai where you can see the entire workflow as a graph — failure paths, retry loops, approval gates — and tweak states without editing YAML directly.
The Results
Statewright’s creator ran a 5-task SWE-bench subset using two local models (13.8GB and 19.9GB parameters). Without constraints: 2 out of 10 attempts passed. With Statewright enabled on the same tasks, same hardware: 10 out of 10. The caveat is worth stating clearly — this is a small subset of SWE-bench, not the full 2,294-instance benchmark. But the directional result is hard to ignore: the models didn’t get smarter, the problem got smaller.
Getting Started
Statewright currently integrates with Claude Code via MCP. Support for Codex, Cursor, opencode, and Pi is in progress. Install via the Claude Code plugin marketplace:
claude /plugin marketplace add statewright/statewright
claude /plugin install statewright
The plugin opens statewright.ai for account creation and API key setup. The Rust engine itself is Apache 2.0 and embeddable with no runtime dependencies — if you want enforcement entirely locally and prefer to build your own tooling around it.
Licensing
The core Rust engine is Apache 2.0. The MCP gateway uses FSL-1.1-ALv2, which converts to Apache 2.0 in 2029. There’s a free tier. The primary community concern on Hacker News was the cloud dependency for the gateway — the short answer is that you can self-host the engine; the cloud piece is optional convenience, not a lock-in requirement. Full documentation is available at docs.statewright.ai.
The Bigger Picture
State machines aren’t a new idea — they’re how reliable distributed systems have been designed for decades. Statewright applies that pattern to agentic coding workflows, where the “distributed system” happens to include an LLM as one of the components. Bigger models and longer context windows address raw capability. State machine guardrails address reliability. These are different problems, and in 2026, the reliability gap is where most production failures are coming from. Statewright is an early, practical answer to that gap.













