Anthropic’s Claude Opus 4.6 discovered 22 security vulnerabilities in Mozilla Firefox during a two-week partnership announced yesterday, including 14 high-severity bugs. The AI found distinct classes of logic errors that had evaded decades of traditional fuzzing, static analysis, and security reviews—proving AI can now find novel vulnerabilities in one of the world’s most well-tested codebases.
This matters because Firefox serves hundreds of millions of users and has been hammered by security researchers for years. If Claude can find bugs that humans missed, every developer should be running AI security audits on their projects. Especially since attackers likely already have similar capabilities.
22 CVEs From Scanning 6,000 C++ Files
Over two weeks, Claude Opus 4.6 scanned nearly 6,000 C++ files in Firefox’s codebase and submitted 112 unique bug reports. This produced 22 CVEs, with 14 classified as high-severity. Additionally, Claude discovered 90 other non-security bugs. All patches landed in Firefox 148, released February 24, 2026.
The speed is striking. Claude found a Use After Free memory vulnerability after just 20 minutes exploring the JavaScript engine. Mozilla’s security team praised the quality—Anthropic provided “minimal test cases” that allowed instant verification and reproduction. Engineers began landing fixes within hours, not days or weeks.
What makes this significant: Claude found “distinct classes of logic errors that fuzzers had not previously uncovered,” according to Anthropic. Despite decades of traditional fuzzing, static analysis, and security reviews, Firefox still had exploitable bugs. A Mozilla engineer called Anthropic’s reports “better even than our usual internal and external fuzzing bugs.” This isn’t pattern-matching—it’s reasoning-based analysis.
$4,000 vs Tens of Thousands
The entire security audit cost approximately $4,000 in API credits. Compare that to traditional security audits costing tens of thousands of dollars and taking months to complete. Individual developers on Hacker News report running Claude security reviews for as little as $3 per audit.
This democratizes security testing. Small open-source projects, startups, and individual developers can now afford comprehensive audits. Anthropic found 500+ vulnerabilities across open-source codebases—many projects likely have similar issues waiting to be discovered. As one Hacker News commenter warned: “You should assume the bad guys have already done it to your project.”
Can’t Exploit What It Finds—Yet
Here’s the critical limitation: Claude excels at finding vulnerabilities but struggles to exploit them. Despite spending $4,000 in API credits across hundreds of attempts, Anthropic successfully weaponized only 2 out of 22 discovered vulnerabilities into working exploits.
Anthropic emphasizes that “this moment favors defenders,” but warns the advantage won’t last. Detection capabilities far exceed exploitation capabilities right now. That’s your window. Strengthen security immediately while this edge exists, because the arms race is real and the gap is closing.
This isn’t theoretical. It’s a practical call to action backed by concrete results.
How to Use Claude for Security Testing Today
Claude Code Security is available now for all Claude Code users with Pro, Max, or API access. Run /security-review in your project directory, or integrate automated security reviews into GitHub Actions to scan every pull request.
The tool uses reasoning-based analysis—understanding how components interact and tracing data flow—rather than simple pattern-matching. It checks for SQL injection, XSS, authentication and authorization flaws, insecure data handling, and dependency vulnerabilities. In testing, it found 500+ vulnerabilities in production open-source codebases.
Best practices from the developer community: Provide detailed documentation (security.md, threat model) because specification-driven approaches work best. Test against known CVEs first to establish baseline performance. Complement automated reviews with manual code reviews—AI augments human expertise, it doesn’t replace it.
Anthropic maintains a similar tool for security testing—Shannon AI Pentester previously achieved 96% success rates at $50 cost, though Anthropic’s focus on Firefox demonstrates enterprise-grade application.
What This Means for Browser Security
Firefox won’t be the last browser to undergo AI security audits. Chrome, Safari, and Edge teams are watching. Gartner predicts that by 2028, more than 50% of enterprises will use AI Security Platforms, up from less than 10% today.
The trend shifts from periodic human audits to continuous AI-assisted monitoring. Security testing is moving into CI/CD pipelines. GitHub Actions integration makes this seamless—every PR gets scanned automatically, catching vulnerabilities before they ship.
Mozilla may continue partnering with Anthropic. The results justify it: 14 high-severity bugs represent roughly 20% of all high-severity Firefox vulnerabilities remediated in 2025. That’s not incremental improvement—that’s a force multiplier for security teams.
Notably, this comes after Firefox recently dealt with bitflip-related crashes affecting 10% of incidents, showing Mozilla’s continued focus on stability and security across multiple fronts.
Community Reaction: Cautiously Optimistic
The Hacker News discussion (356 points) reveals developer sentiment: cautious optimism mixed with healthy skepticism. Key concerns include AI’s tendency to incorrectly identify security boundaries and struggles with bugs requiring cross-feature interactions. However, all 22 vulnerabilities were genuine—every finding crashed the browser or triggered assertion failures.
Developers praised the quality over quantity approach. This wasn’t AI spam flooding Mozilla’s bug tracker. These were reproducible test cases with detailed descriptions, each verified and patched. The community consensus: AI should augment human security expertise, not replace it. When done well, as Anthropic demonstrated, it produces high-quality results that complement traditional security methods.
Key Takeaways
- AI security testing crossed a threshold—finding novel bugs in heavily-tested code is now viable
- Cost-efficient at scale—$3-$4,000 for comprehensive audits vs tens of thousands traditionally
- Detection beats exploitation—defenders have temporary advantage while AI struggles with exploit development
- Accessible to all—run
/security-reviewin Claude Code or integrate GitHub Actions - Act now—attackers likely have similar tools; defenders’ window won’t last
- Complement, don’t replace—AI augments human expertise but requires oversight









