AI & Development

Anthropic Leaks Mythos Model & Claude Code in 5 Days

Anthropic, the AI company positioning itself as the “careful” and “responsible” alternative to OpenAI, suffered two major security failures within five days in late March 2026. On March 26, the company exposed roughly 3,000 internal files via a misconfigured content management system, inadvertently revealing “Mythos” (also called “Capybara”), their most powerful AI model with “unprecedented cybersecurity risks.” Five days later, on March 31, Anthropic leaked the entire Claude Code source code—512,000 lines of TypeScript—through an npm packaging error. Both failures stem from basic operational mistakes: CMS misconfigurations and bundler defaults that any developer knows to check. The irony is brutal for developers: a company built on AI safety claims can’t execute basic operational security.

The Mythos Leak: Dangerous Model Exposed Before Ready

On March 26, 2026, Anthropic accidentally exposed approximately 3,000 internal files through a misconfigured CMS, revealing draft documentation for “Mythos” (codename: “Capybara”), their most powerful AI model. Security researchers Roy Paz from LayerX Security and Alexandre Pauwels from the University of Cambridge discovered the publicly accessible files. The leaked draft blog warned of “unprecedented cybersecurity risks” and stated Mythos is “currently far ahead of any other AI model in cyber capabilities.”

The draft described Capybara as “a new name for a new tier of model: larger and more intelligent than our Opus models—which were, until now, our most powerful.” Compared to Claude Opus 4.6, Mythos achieves “dramatically higher scores” on software coding, academic reasoning, and cybersecurity benchmarks. Moreover, Anthropic had been privately warning government officials that the model makes large-scale cyberattacks “much more likely in 2026.” The leak exposed offensive AI capabilities before safety measures were ready—operational failure, not just reputational embarrassment.

Five Days Later: 512,000 Lines Leaked via npm

On March 31, 2026—just five days after the Mythos leak—security researcher Chaofan Shou discovered that Claude Code’s npm package (v2.1.88) included source maps exposing the entire codebase: 512,000 lines of TypeScript across 2,000 files. The root cause was Bun bundler’s default behavior: it generates source maps unless explicitly disabled. A misconfigured .npmignore or files field in package.json failed to exclude *.map files. Developers mirrored the code to GitHub within hours and began analyzing Anthropic’s proprietary agent architecture.

The leaked code revealed previously secret features. KAIROS, an always-on background agent framework, enables persistent agent capabilities beyond single-session interactions. “Undercover Mode,” perhaps more controversially, includes system prompts explicitly stating: “Your commit messages, PR titles, and PR bodies MUST NOT contain ANY Anthropic-internal information.” The feature is designed to hide Anthropic authorship in open-source contributions, raising ethical questions about transparency in AI-generated code. Anthropic confirmed the leak in a statement: “This was a release packaging issue caused by human error, not a security breach. No sensitive customer data or credentials were involved or exposed.”

The Technical Failures: Configuration 101 Mistakes

Both leaks resulted from elementary operational errors that developers know to prevent. The CMS misconfiguration allowed content marked “unpublished” to remain publicly accessible via cache—no authentication required. Anthropic attributed it to “human error in CMS configuration,” which is accurate but doesn’t address why basic access testing wasn’t part of their pre-publish checklist.

The npm source map exposure is even more common. Bun bundler generates source maps by default unless you set sourcemap: ‘none’ in production configurations. A single line in .npmignore could have prevented it: *.map. Developers can verify package contents before publishing with npm pack && tar -tzf *.tgz | grep .map—if output appears, source maps are included and need fixing. The developer community’s reaction on Hacker News and DEV Community was empathetic: “This could happen to anyone. I’ve made the same npm source map mistake.” The failures are relatable, which makes them universal lessons, not just Anthropic-specific incompetence.

Related: AI Code Quality Crisis: 84% Adoption, 29% Trust in 2026

Why This Matters: “Responsible AI” Requires Operational Competence

Anthropic built its identity around being the “careful AI company”—founded by ex-OpenAI safety researchers, vocal about AI risks, publishing detailed safety research. TechCrunch noted the timing: “Anthropic has built its public identity around being the careful AI company, publishing detailed work on AI risk and being vocal about responsibilities that come with building powerful technology.” Then they leaked their most dangerous model and flagship product’s source code within five days through basic operational errors.

The gap between safety claims and operational execution creates a credibility crisis. If Anthropic can’t secure a CMS configuration or check npm package contents, why trust their ability to implement AI safety measures at scale? The leaks don’t disprove Anthropic’s technical AI safety research, but they expose a dangerous gap: theoretical safety doesn’t translate to operational discipline. Furthermore, Anthropic’s framing—”no sensitive customer data or credentials exposed”—downplays severity. What actually leaked: a model with “unprecedented cybersecurity risks,” 512,000 lines of competitive intelligence, and offensive AI capabilities before safety measures were ready. Internal IP leaks are catastrophic even without customer PII.

Key Takeaways

  • Anthropic leaked Mythos model documentation (March 26) and Claude Code source (March 31) within five days—both from basic configuration errors (CMS misconfiguration, npm source maps)
  • The failures expose a credibility gap: “responsible AI” companies must demonstrate operational competence, not just theoretical safety research; can’t secure CMS settings undermines trust in AI safety claims
  • Mythos leak is particularly concerning—model with “unprecedented cybersecurity risks” and offensive capabilities “far ahead of any other AI model” exposed before safety measures ready, with government warnings now public
  • Developer lessons are universal: never trust bundler defaults for production (check source map generation), always inspect package contents before npm publish (npm pack && tar -tzf), test “unpublished” CMS content from unauthenticated sessions
  • The leaked Claude Code features raise questions—KAIROS (always-on agent) architecture now public, “Undercover Mode” for hiding Anthropic authorship in open-source contributions sparks transparency debate
ByteBot
I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

    You may also like

    Leave a reply

    Your email address will not be published. Required fields are marked *