Microsoft MDASH in Production: AI Agents Are Finding Real Windows CVEs

Multi-agent AI security scanning system visualization showing auditor, debater, and prover agent nodes in a hierarchical network with blue circuit patterns on dark background

Microsoft MDASH: 100+ AI agents hunting vulnerabilities in production across Windows and Azure

Microsoft’s multi-model agentic scanning harness — codename MDASH — stopped being a research project on June 17, 2026. It’s now running in production across Windows, Azure, and identity engineering workflows, actively surfacing vulnerabilities that are being patched in real Patch Tuesday cycles. If you’re on the Microsoft stack and haven’t heard of MDASH yet, you will be soon — it’s feeding findings directly into your pipeline.

What MDASH Has Already Found

The May 2026 Patch Tuesday included 16 CVEs discovered by MDASH — 10 kernel-mode, 6 user-mode, four rated Critical. Two to know specifically:

CVE-2026-33824 (CVSS 9.8): An IKEv2 double-free requiring no authentication. Unauthenticated remote exploitability at that severity is about as bad as it gets.
CVE-2026-33827 (CVSS 8.1): A use-after-free in tcpip.sys triggered by crafted IPv4 packets with the Strict Source and Record Route option — reachable from the network without credentials.

The June batch is broader. Microsoft’s June 17 announcement lists Hyper-V, the Windows kernel, Active Directory Domain Services, Remote Desktop Client, HTTP.sys, DNS Client, and DHCP Client as targets. MDASH isn’t picking low-hanging fruit — it’s going after the deep infrastructure that defenders worry about most.

How the Architecture Works

This is the part worth understanding, because it explains why MDASH achieves something traditional static analysis tools don’t: a false positive rate low enough to be useful in production developer workflows.

MDASH runs three specialized agent types in sequence:

Auditor agents scan code for suspicious patterns and generate vulnerability hypotheses. Powered by smaller models (Phi-3, Llama-3-8B) for cost-efficient breadth scanning at scale.
Debater agents challenge each finding, arguing for and against exploitability. These run on frontier models with the reasoning depth to stress-test edge cases.
Prover agents construct actual triggering inputs confirming a flaw is exploitable. No proof, no escalation.

Microsoft’s key design insight: disagreement between models is itself a signal. When the auditor flags something and the debater can’t refute it, the system’s confidence in that finding rises. The result is a pipeline that catches real bugs and filters noise — fixing the core problem that made traditional shift-left security miserable for developers: undifferentiated alert volume with no proof of exploitability.

Where Findings Land in Your Workflow

MDASH findings don’t sit in a security team’s spreadsheet. They flow into developer tooling:

Findings surface in the Microsoft Defender Portal, enriched with production risk signals — internet exposure, data sensitivity — so you know which issues are genuinely hot.
High-priority findings become work items in Azure DevOps, optionally gating pipeline builds. They’ll appear in your sprint, not a quarterly audit report.
GitHub Copilot Autofix generates a suggested PR for flagged alerts via GitHub Advanced Security for Azure DevOps. The full loop: vulnerability found → Defender alert → work item → Copilot generates fix PR → you review and merge → alert auto-resolves after the next scan.

Security review stops being an episodic checkpoint — something that happens before a release — and becomes a continuous engineering loop running in the background. That’s the real shift.

The Benchmark Numbers

MDASH scored 96.55% on CyberGym — UC Berkeley’s benchmark built on 1,507 real-world vulnerability reproduction tasks across 188 open-source projects. That’s up from 88.45% at launch in May — a 10-point jump in three weeks. It outperforms Anthropic’s Claude Mythos on the same benchmark, which is notable given Microsoft’s otherwise close relationship with Anthropic’s model lineup.

What Developers Should Do Now

If your organization runs on Azure DevOps and GitHub, three concrete steps:

Enroll in the MDASH expanded preview. Contact your Microsoft account representative. The expanded preview is open to eligible organizations following the Build 2026 announcement.
Connect GitHub Code Security to Microsoft Defender. The native integration routes MDASH findings into the Defender Portal — requires configuration but no new tooling.
Enable Copilot Autofix for GitHub Advanced Security. Autofix turns CodeQL alerts into suggested PRs automatically, closing the last mile of the remediation loop.

MDASH represents a genuine capability shift — not another “AI will transform security” announcement that never ships. AI is now finding and proving exploitability at hyperscale inside Microsoft’s own production systems. The bottleneck used to be human pen testers with limited hours. Now it’s 100 agents running continuously. Prepare your sprint backlog accordingly.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.