AI Broke CTF Competitions in 2026. Hiring Is Next.

Split-screen showing CTF scoreboard with AI-solved challenges on left and lone hacker at terminal on right, with a crack symbolizing the broken competition format

At BSidesSF 2026 this spring, a team won the CTF by solving all 52 challenges. They built their winning agent in a weekend. Sixteen teams completed every single challenge. No challenge received fewer than 25 solves. The top 10 teams automated their entire pipeline — continuous monitoring, parallel agents, auto-submit — and most challenges fell within minutes of release. That’s not a competition result. That’s a format collapse.

CTF (Capture the Flag) competitions have served as the security industry’s primary talent identification and training mechanism for decades. Companies including Google, the NSA, and every major cybersecurity firm recruit heavily from top CTF performers on CTFtime.org. If the open competition format is structurally broken, the talent pipeline it feeds is in trouble.

What Happened at BSidesSF 2026

Veria Labs — members of the US’s top-ranked CTF team — built an autonomous agent that won first place by solving 52 of 52 challenges. Their open-sourced CTF agent uses a coordinator LLM to manage the competition while swarms of solver agents attack individual challenges in parallel Docker containers. Five models race simultaneously: Claude Opus 4.6 (medium and max modes), GPT-5.4, GPT-5.4-mini, and GPT-5.3-codex. First model to find the flag wins. Cost to replicate this infrastructure? A few hundred dollars a month in API credits.

This isn’t an isolated result. An experienced competitor’s firsthand analysis of the same event found they’d have placed 75th without AI assistance versus 5th place with it. Categories that previously separated skilled hackers from amateurs — binary exploitation and cryptography — now fall routinely to Claude Code and Codex. “The competition has shifted from who can solve the most to who can deploy the best infrastructure,” the analysis concluded. That’s a money game, not a skills game.

Why the Hiring Signal Is Now Miscalibrated

CTF performance has long been a proxy for raw security skill that certifications can’t replicate. You either get the flag or you don’t. No multiple-choice exam, no partial credit. That reliability is gone. A top ranking in 2026 may reflect excellent API budget and agent orchestration skills, not exploitation capability. The instrument is broken.

The UK’s AI Security Institute evaluated Claude Mythos Preview on expert-level CTF tasks and found a 73% success rate. Frontier models are near-expert at what used to be the best available hands-on security test. Companies that use CTF rankings in hiring will need to supplement those rankings with explicit manual exploitation assessments that exclude AI assistance. The screening process needs rebuilding.

Related: Pwn2Own Berlin 2026: Cursor, LiteLLM, and Codex Hacked

What’s Still Holding (For Now)

The format collapse isn’t total. Top-tier competitions — DEF CON Quals, hxp — still hold because they deliberately design for this moment. As one challenge author put it, “challenge design increasingly means anticipating what the next frontier model will be able to do — which is a new and genuinely difficult constraint.” The most AI-resistant challenges share common traits: they require deep knowledge of poorly-documented software internals, areas where documentation contradicts source code, or completely novel categories with minimal public writeups for models to train on.

However, the constraint itself reveals the problem. Challenge designers have always competed against skilled humans. Now they’re competing against models that score 73% on their hardest problems and improve every six months. That’s an arms race challenge authors are structurally unlikely to win.

What the Security Community Is Doing About It

DEF CON 34 CTF Qualifier runs May 22-24, 2026, under new organizers — “Benevolent Bureau of Birds” — formed specifically to navigate the AI era. DEF CON Singapore 2026 already features dedicated AI/IoT challenge tracks. Researchers have proposed formalizing three autonomy levels for CTF competitions: human-in-the-loop, hybrid human-AI, and fully autonomous — each requiring traceable submissions including conversation logs and agent trajectories. If DEF CON 34 in August can’t make the format work, it forces the question of whether CTFs need a successor institution, not just an update.

The community is also debating whether AI orchestration is simply the new skill being measured. Building better agent pipelines does require real engineering. But it’s not the same as being able to exploit a buffer overflow by hand at 2 AM. The security field needs both.

Key Takeaways

BSidesSF 2026 demonstrated the open CTF format is broken: 16 teams completed everything, no challenge went unsolved, and the winning agent was built in a weekend using freely available API access
CTF rankings are now a miscalibrated hiring signal — companies need to add manual exploitation assessments that explicitly exclude AI assistance
Top-tier competitions (DEF CON Quals, hxp) remain largely AI-resistant by deliberately designing for poorly-documented internals and novel categories — but they’re in an arms race they’re structurally unlikely to win long-term
DEF CON 34 (August 2026) will be the real test of whether the security community’s flagship competition can adapt or whether a new institution needs to replace it

The HN discussion (412 points, 438 comments) has more from practitioners actively working through what comes next.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.