
Most teams can tell you what their AI agents are supposed to do. Almost none can prove the implementation actually matches that policy. Exabeam’s Praxen, launched June 23 as an Apache 2.0 open-source tool, is the first purpose-built solution for this gap — it takes your declared agent policy and checks it against the evidence: source code, configurations, behavioral logs, and deployment state.
The Problem No One Was Measuring
AI agents are no longer toy demos. They’re running CI jobs, accessing production databases, committing to version control, and managing cloud credentials. According to a 2026 survey of over 900 technical leaders, 88% of organizations have already experienced AI agent security incidents that their existing policies failed to prevent. Meanwhile, 82% of those same executives felt confident those policies were adequate.
That gap has a name: policy drift. Agents accumulate excess permissions over time. They get configured with tools that technically work but exceed their stated scope. MCP servers get added to deployment configs without updating the security review. No tool was looking for the divergence between “what we wrote in the policy doc” and “what the agent can actually do.”
The numbers make the stakes clear: organizations enforcing least-privilege access for AI agents report a 17% incident rate. Those without it: 76%. That 4.5x difference isn’t a new lesson — it’s the same principle identity management taught us a decade ago with IAM policies, now applied to agents.
What Praxen Does
Praxen introduces a concept it calls the Worker Remit — a Markdown policy document that forces teams to explicitly declare what an agent is authorized to do before deployment. The remit covers the agent’s mission, its authorized tools, approved channels and counterparties, and explicitly forbidden actions.
With a remit in hand, Praxen ingests evidence from the agent’s implementation and environment: source code, deployment configurations, behavioral logs, and MCP server configs. It then produces a gap report — specific findings for every place the observed implementation diverges from declared intent, recommendations for each finding, and an overall maturity score using the RAISE Framework (six categories, 0–5 per category).
Each finding is tagged against the OWASP Top 10 for Agentic AI Applications 2026 and — when MCP configuration is present — the OWASP Secure MCP Server Development Guide 2026. Under the EU AI Act, documenting intended vs. actual agent behavior is increasingly a compliance requirement for high-risk systems. A Worker Remit plus a Praxen report is a natural artifact for that.
How to Run It
Praxen runs as a skill plugin for Claude Code or OpenAI Codex. No pip install required — one command adds the plugin and you point it at the agent you want to evaluate. It requires Python 3.9+ and a Sonnet-class model or higher on Claude Code, or a GPT-5-class equivalent on Codex.
The Worker Remit is worth writing even if you never run Praxen. Forcing your team to write down exactly what the agent may and may not do — in a structured document, before deployment — is the kind of discipline that IAM policy reviews made standard for cloud resources. Agents deserve the same rigor.
What It Doesn’t Solve
Praxen is a pre-deployment check, not a runtime monitor. Exabeam’s Agent Behavior Analytics product handles the runtime layer. Praxen answers “did we build the agent we intended?” — ABA answers “is the deployed agent behaving as intended in production?” You need both.
The tool is also only as good as the evidence you give it. Behavioral logs improve the analysis significantly; early-stage agents that haven’t generated logs yet will get a less complete picture. And the Worker Remit quality matters: a vague policy document produces a vague gap report. Launched a week ago, Praxen hasn’t been battle-tested at enterprise scale yet. Treat it as a strong v1.0 for a discipline that previously had no dedicated tooling.
Why This Week Matters
CVE-2026-12957 dropped in June: Amazon Q Developer auto-loaded untrusted MCP server configurations from workspace files and executed them without user consent, handing attackers the developer’s AWS credentials, SSH keys, and cloud tokens. The root issue — an agent trusting configuration it shouldn’t, with no verification of scope — is exactly what a pre-deployment audit like Praxen would surface before production.
That CVE isn’t an outlier. Check Point Research found similar auto-execution vulnerabilities in Claude Code. OX Security found one in Windsurf. The pattern is consistent: agents deployed with more trust than their configuration actually enforces. Praxen doesn’t prevent vulnerabilities baked into a dependency — but it gives teams the tooling to ask “does this agent’s trust boundary match what we intended?” before shipping.
The Bottom Line
If your team is deploying AI agents — and by mid-2026, most are — Praxen earns a spot in your pre-deployment checklist alongside dependency scanning and SAST. It’s early, it needs good inputs to produce good outputs, and it doesn’t replace runtime monitoring. But the Worker Remit concept alone is worth the time to write one: it forces the kind of explicit scope declaration that should precede any autonomous system reaching production.
The tool that makes the gap between policy and implementation visible is the tool you need before that gap becomes a CVE. The Praxen repository is on GitHub under Apache 2.0.













