
Developers are shipping AI agents without a single security test. While the industry debates agent frameworks and model benchmarks, prompt injection attacks are quietly landing in production — and the blast radius is no longer embarrassing, it’s catastrophic. On May 20, 2026, Microsoft open-sourced RAMPART and Clarity: two tools that pull security testing into development before the breach, not after.
RAMPART: Security Tests That Belong in Your CI Pipeline
RAMPART (Risk Assessment and Measurement Platform for Agentic Red Teaming) is pytest-native. It slots directly into how most Python teams already test — no separate security team, no post-deployment audit. You write test scenarios the same way you write integration tests, and they gate your CI pipeline exactly like any other test failure.
Each test connects to your agent through a thin adapter, orchestrates an interaction drawn from your threat model, and evaluates the observable outcome. The result is a clear pass or fail — blockable in CI, trackable across commits, and repeatable on every push.
The primary focus in RAMPART’s current release is cross-prompt injection: the attack where an agent retrieves poisoned content from documents, emails, tickets, or other external data sources, and that content silently redirects the agent’s behavior. This is not theoretical. CVE-2025-53773 chained prompt injection in GitHub Copilot pull request descriptions directly into remote code execution — CVSS 9.6. A single prompt injection across three popular AI coding agents leaked secrets and credentials in production, documented by VentureBeat earlier this year.
RAMPART is built on top of PyRIT, Microsoft’s existing red-team automation framework. The two serve different audiences at different times: PyRIT is for security researchers probing a finished system; RAMPART is for engineers building it, catching vulnerabilities in the same sprint they’re introduced.
Why Binary Pass/Fail Fails for AI Agents
Here’s the problem every agent developer hits: run the same test twice and get two different results. LLMs are probabilistic. A single-shot assertion that worked yesterday might fail today on identical input — or worse, pass when it should fail.
RAMPART addresses this with policy-based evaluation. Instead of a binary assert, you define a threshold: “this action must be safe in at least 80% of runs.” The framework executes the scenario multiple times and checks whether the outcome meets your policy. That’s how production agents actually behave, and it’s how they need to be validated.
You can combine evaluators with boolean logic to express compound safety conditions — not just “did the agent refuse?” but “did it refuse, log the attempt, and not leak intermediate state?” That maps to real security requirements in ways simple assertions never will.
Clarity: Catching Design Mistakes Before They’re Built
Most security vulnerabilities in AI agents start as design decisions, not implementation bugs. You decide the agent can read the email inbox. You decide it can execute shell commands. Those decisions — made quickly in a planning call — become the attack surface months later.
Clarity is Microsoft’s tool for making those decisions explicitly, before the code exists. It runs as a desktop app, web UI, or embedded inside a coding agent. It guides teams through structured conversations: What problem are you solving? What could go wrong? What adversarial scenarios haven’t you considered? Multiple AI reviewers examine the system simultaneously from different angles — security, human factors, operational concerns.
The output lands in a .clarity-protocol/ directory in your repository as plain markdown. These files get committed, reviewed in pull requests, and diffed like source code. When related decisions change, Clarity flags stale assumptions. Think of it as architecture decision records with a security lens and an AI co-reviewer built in.
The Gap RAMPART Fills
The AI agent security tooling landscape has a clear gap. Garak (NVIDIA) and PyRIT are research tools for security professionals probing systems they didn’t build. The OWASP Top 10 for LLMs gives you a framework. None of them sit inside a developer’s CI pipeline delivering automated, commit-level security feedback during active development.
RAMPART fills that gap. The numbers make the urgency clear: the UK AI Security Institute documented 700 real-world cases of AI agent misbehavior, with a five-fold rise between October 2025 and March 2026. Gartner estimates 40% of enterprise applications will integrate AI agents by end of 2026 — up from under 5% in 2025. The attack surface is expanding faster than the tooling.
This is the pytest moment for agent security. The point where a category of testing becomes accessible enough that not doing it is no longer a reasonable tradeoff — it’s negligence.
Getting Started
Both tools are available now. RAMPART is at github.com/microsoft/RAMPART. The Microsoft Security Blog announcement covers both tools with architecture details and example scenarios for immediate use.
If you’re shipping AI agents without RAMPART tests in CI, you haven’t finished the job. The tools exist. The excuse is gone.













