Cybersecurity Proof of Work: AI Security Token Economics

Abstract visualization of cybersecurity as proof-of-work economics with blue gradient waves, security tokens, AI nodes, and computational token streams

On April 14, 2026, security engineer Drew Breunig published a provocative thesis: modern cybersecurity now resembles cryptocurrency’s proof-of-work system. Instead of securing systems through clever engineering, defenders must outspend attackers in computational tokens discovering vulnerabilities. The catalyst is Anthropic’s Claude Mythos Preview, announced April 7, which became the first AI model to autonomously complete a 32-step corporate network attack simulation—at $12,500 per attempt. UK AI Security Institute testing confirms Mythos finds zero-day vulnerabilities, develops functional exploits, and shows “no signs of diminishing returns” with increased token budgets. Security is no longer about being smart. It’s about being rich.

AI Hacking Capabilities Are Real, Not Hype

Mythos Preview completed the UK AISI’s 32-step corporate network attack simulation in 3 of 10 attempts, averaging 22 of 32 steps. This simulation normally requires 20 hours of human expert time to complete from initial reconnaissance through full network takeover. No previous AI model had finished end-to-end.

The performance jump is dramatic. Mythos produced 181 working Firefox exploits compared to predecessor Claude Opus 4.6’s 2 exploits from several hundred attempts—a 90x improvement. On OSS-Fuzz benchmarks, Mythos achieved tier 5 complete control flow hijacks on 10 fully-patched targets where previous models managed single-digit results. It discovered over 1,000 high and critical-severity vulnerabilities across major operating systems and browsers, including bugs that survived decades of expert review: a 27-year-old OpenBSD TCP SACK denial-of-service, a 17-year-old FreeBSD NFS remote code execution with root access.

Cost economics tell the story. Comprehensive scans run approximately $12,500 per attempt at 100M tokens. Discovering vulnerabilities in OpenBSD cost under $20,000 for 1,000 runs. Complex Linux exploit development completed in under 24 hours for less than $2,000. Performance continues improving with increased token budgets up to 100M tokens with no saturation point observed. More tokens equal more vulnerabilities found, linearly or better.

Security Becomes an Economic Contest, Not an Engineering Challenge

Breunig’s April 14 thesis argues security has become a “low temperature lottery” where success requires spending more computational tokens discovering vulnerabilities than attackers spend exploiting them. This replaces traditional security philosophy—clever design, defense-in-depth, secure-by-default—with economic competition. If defenders spend 1,000 token-hours hardening their code, attackers must spend 1,000+ token-hours to find remaining vulnerabilities. The system is secured not by brilliant architecture but by budget superiority in a computational arms race.

He predicts a three-phase development cycle emerges: development (human-driven coding), review (best practices and traditional auditing), and hardening (token-intensive AI vulnerability scanning). That third phase represents a 10-20% cost increase for most projects—$50,000 to $200,000 annually for continuous AI hardening. Traditional security audits cost similar amounts ($50,000-250,000) but miss bugs humans can’t see. AI hardening finds vulnerabilities that survived decades of expert review, but at a price most teams can’t afford.

The Hacker News discussion (460 points, 170 comments) splits the community. Thomas Ptacek, a prominent security researcher, supports the analogy, noting defenders with source code access gain efficiency advantages over attackers who must repeatedly scan targets. Antirez, Redis creator, challenges a core assumption: “Bugs aren’t like hash collisions. Eventually the cap becomes not M [model runs] but I, the model intelligence level. An inferior model never finds complex multi-step vulnerabilities regardless of token spend.” The debate reveals fundamental uncertainty: does security become pure economics, or does model quality matter more than quantity?

Open Source Gets Strategically Critical (But Only If Funded)

Breunig’s thesis argues collective token spending secures popular open-source libraries better than individual implementations. It extends Linus’s Law from “given enough eyeballs, all bugs are shallow” to “given enough tokens, all bugs become visible.” Popular OSS projects like Linux, OpenSSL, and major browsers benefit from pooled resources for AI hardening. Small teams writing custom code can’t compete with collectively-hardened dependencies that have absorbed millions in token spending from corporate sponsors and foundation grants.

This directly contradicts recent arguments from Andrej Karpathy and others advocating LLM reimplementation to reduce dependencies and improve code understanding. In a proof-of-work security model, that strategy is backwards. Trusting collectively-hardened OSS libraries is safer than writing custom code that can’t afford equivalent AI hardening. The dependency calculation inverts: OSS becomes strategically critical infrastructure that must be protected through collective investment.

Industry response acknowledges the crisis. The Linux Foundation announced $12.5 million in grants from Anthropic, AWS, GitHub, Google, Microsoft, and OpenAI in March 2026 to help maintainers handle what they explicitly call “an unprecedented influx of security findings, many generated by automated systems, without the resources or tooling needed to triage and remediate them effectively.” GitHub added $5.5 million in Azure credits through its Secure Open Source Fund. The funding targets Alpha-Omega and OpenSSF to coordinate AI scanning of critical OSS and support maintainer triage capacity. But finding bugs is useless without bandwidth to fix them.

The Uncomfortable Truth: Finding Bugs Isn’t Fixing Them

Mythos discovered over 1,000 high and critical-severity vulnerabilities during testing. As of April 2026, over 99% remain unpatched—not because maintainers are negligent, but because coordinated responsible disclosure takes time (90 days minimum) and vendor patching capacity is limited. The Hacker News discussion surfaced a critical bottleneck: AI can discover thousands of vulnerabilities, but small teams only have organizational bandwidth to safely patch 5-10 without breaking production systems.

Google DeepMind’s CodeMender attempts to address this gap by automatically generating patches for discovered vulnerabilities. Chrome security teams report success using Big Sleep to autonomously find and CodeMender to fix deep, exploitable bugs. But complex vulnerabilities still require human review. Automated patch generation doesn’t solve organizational triage challenges: which bugs are exploitable versus theoretical? Which fixes can deploy safely? What’s the testing burden for each patch?

This creates security theater risk. Passing an AI hardening audit doesn’t mean you’re secure if discovered vulnerabilities remain unpatched. The real bottleneck isn’t finding bugs—AI handles that efficiently. The bottleneck is organizational capacity to review findings, test fixes, and deploy patches without destabilizing production systems. Token spending discovers vulnerabilities at scale, but fixing them still requires human judgment and organizational processes that don’t scale linearly with budget.

Is This Arms Race Sustainable?

The proof-of-work security model assumes defenders can outspend attackers. But what happens when well-funded attackers—nation-states, organized cybercrime, competitors—match or exceed defender token budgets? Mythos Preview is currently restricted to Project Glasswing partners (critical infrastructure and OSS developers), but attackers may develop equivalent models independently or wait for inevitable leaks. Security becomes a function of capital, and attackers with deep pockets get the same tools defenders use.

For most organizations, continuous AI hardening at $50,000-200,000+ annually is prohibitive. This creates a two-tier security landscape where large enterprises can afford token-intensive hardening while small companies, startups, and indie developers get priced out. Security inequality emerges by economic design. Small projects become disproportionately attractive targets because attackers know they lack AI hardening budgets.

Alternative approaches exist but require different trade-offs. Formal verification (seL4 microkernel, CompCert compiler) mathematically proves correctness, eliminating vulnerabilities by design rather than discovery—but demands organizational maturity and development overhead most projects can’t support. Memory-safe languages like Rust eliminate entire vulnerability classes (buffer overflows, use-after-free) through compiler guarantees, but require rewriting legacy code. Architectural simplicity (Wireguard’s 4,000 lines versus OpenVPN’s 100,000) reduces attack surface, making audits cheaper—but only works when building new systems from scratch.

The UK AISI evaluation includes a critical caveat: “Mythos Preview can exploit systems with weak security posture.” Test environments lacked active defenders, defensive tooling, and security monitoring. Real-world hardened systems with 24/7 SOC teams present harder targets. Defense-in-depth, least privilege, and architectural security still matter. Proof-of-work security is layer four, not layer one. Organizations that skip traditional security fundamentals and jump straight to AI hardening are building on unstable foundations.

Key Takeaways

Claude Mythos Preview proves AI can autonomously execute 32-step network attacks (30% success rate), produce 181 Firefox exploits (90x improvement over previous models), and discover 1,000+ critical vulnerabilities at $12,500 per comprehensive scan. This capability is real, not hype, with independent UK AISI validation.
Security economics shift from clever engineering to computational token spending. Defenders must outspend attackers in AI scanning budgets ($50,000-200,000+ annually for continuous hardening). No diminishing returns observed: more tokens = more bugs found, making security a function of budget, not expertise.
Open-source libraries become strategically critical through collective token hardening. Popular OSS projects benefit from pooled resources (Linux Foundation $12.5M grants, GitHub $5.5M credits). Trusting collectively-hardened dependencies is safer than custom code that can’t afford equivalent AI scanning—contradicting recent anti-dependency arguments.
Finding vulnerabilities ≠ fixing them. Mythos discovered 1,000+ bugs; 99% remain unpatched due to limited maintainer bandwidth and disclosure timelines. AI discovers at scale, but organizations still need capacity to triage, test patches, and deploy fixes. Security theater risk: passing AI audit doesn’t equal secure if bugs aren’t fixed.
Proof-of-work security creates unsustainable inequality. Well-funded enterprises can afford continuous AI hardening; small teams get priced out. Long-term alternatives include formal verification, memory-safe languages (Rust), and architectural simplicity. Traditional security fundamentals (defense-in-depth, monitoring, access controls) remain essential—AI hardening supplements, doesn’t replace.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.