Claude Cowork Security Flaw: 2-Day Exploit Timeline

Two days after Anthropic launched Claude Cowork as a desktop file organizer for non-technical users, security researchers publicly disclosed a critical Claude Cowork security vulnerability that lets attackers steal those same files without user approval. The flaw was known to Anthropic for months before launch. Their response? Tell users to “avoid granting access to local files with sensitive information” — despite marketing Cowork specifically for desktop file management.

That’s not a security advisory. That’s corporate absurdism.

How the Attack Works

The vulnerability exploits Cowork’s trust model through indirect prompt injection. Here’s the attack flow: an attacker creates a document with hidden instructions — white text on white background, 1-point font, line spacing compressed to near-invisibility. The victim opens this file in Cowork or grants Cowork access to a folder containing it. The hidden prompt tells Claude to read sensitive files, package them, and upload them to the attacker’s Anthropic account via the https://api.anthropic.com/v1/files endpoint.

The exploit works because Anthropic’s own API domain is whitelisted as “trusted” in Claude’s code execution sandbox. Attackers include their own API key in the injection, turning Anthropic’s security boundary into an exfiltration pipeline. Simon Willison’s detailed technical analysis confirms no user approval is required at any point.

Tax returns, bank statements, proprietary code, credentials — anything Cowork can access becomes fair game.

The Known-But-Unresolved Timeline

This wasn’t an oversight. Security researcher Johann Rehberger discovered the vulnerability in Claude.ai chat before Cowork existed and disclosed it responsibly via HackerOne. Anthropic initially dismissed it as a “model safety issue,” claiming it was out of scope for security. After public pressure, the company acknowledged it as a valid security issue on October 30, 2025.

Then on January 12, 2026, Anthropic launched Cowork anyway with the vulnerability still unfixed. Two days later, Prompt Armor went public to warn users. As of today, the flaw remains unpatched.

This timeline reveals a business decision, not a technical limitation. Anthropic chose speed to market over user security, then shifted responsibility to users when called out.

The Contradiction Problem

Marketing a file manager while telling users not to trust it with sensitive files isn’t just contradictory — it’s absurd. Cowork’s promoted use cases include “reorganizing downloads,” “organizing desktop files,” and “turning receipt screenshots into expense spreadsheets.” These tasks inherently involve financial documents, personal data, and work files.

The Hacker News community (643 points, 287 comments) didn’t mince words: “This is like selling a safe with a known lock vulnerability and telling users to keep valuables elsewhere.” Another commenter nailed it: “AI agents are the new insider threat.”

Anthropic built a product that requires file access to function, acknowledged a vulnerability that makes file access unsafe, and shipped it anyway. That’s not user error. That’s a company error.

This Isn’t Just Claude

Before we pile on Anthropic alone, understand this is a systemic problem. OpenAI admits prompt injection is “unlikely to ever be fully solved.” The UK National Cyber Security Centre warns it “may never be totally mitigated.” Meta research found prompt injection attacks succeeded 86% of the time. OWASP 2025 ranks it as the number one threat for LLMs and generative AI applications.

Recent incidents prove the pattern: Slack AI leaked private channel data in August 2024. Perplexity’s Comet feature was tricked into exfiltrating one-time passwords via hidden Reddit text. This month, Radware disclosed ZombieAgent, a zero-click injection targeting OpenAI’s Deep Research agent. Over 30 vulnerabilities have been found in AI coding tools, enabling data theft and remote code execution.

AI agents with file access and code execution capabilities fundamentally expand the attack surface. The question isn’t if they’ll be exploited, but when and how often.

What Developers Should Know

If you’re using Cowork or any AI agent with file access, treat it like you would an intern with administrator privileges: least privilege, strict boundaries, constant monitoring.

Don’t grant AI agents access to folders containing sensitive data. Use isolated environments for AI agent file work — separate from production code, credentials, or financial documents. Scan files for hidden text before uploading to AI tools (though detection is difficult). Implement human verification before agents execute sensitive actions.

Organizationally, security experts emphasize that mitigation requires “architecture, not vibes” — trust boundaries, output verification, tool-call validation, and continuous audit logs. Red team your AI agent workflows the same way you would any privileged system access.

The era of trusting AI agents with unrestricted file access is over before it really began. Anthropic’s Cowork launch proves that companies will ship vulnerable products and blame users when things go wrong. Adjust your security posture accordingly.

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

Claude Cowork Security Flaw: 2-Day Exploit Timeline

How the Attack Works

The Known-But-Unresolved Timeline

The Contradiction Problem

This Isn’t Just Claude

What Developers Should Know

Microservices Consolidation: 42% Return to Monoliths

DeepSeek R1: How a $6M Model Shattered AI’s Cost Barrier

Leave a reply Cancel reply

More in:News

Kagi Small Web Mobile: Fighting 33% AI Content Slop

Meta’s $2B Lobbying: Age Verification Laws Exposed

AirPods Max 2: Why Apple’s Quiet Launch Matters

SEC Quarterly Reporting: Does Ending It Fix Short-Termism?

Hindsight Agent Memory Beats Context Windows by +44.6%

Meta AI Delay: $135B Bet Cracks After Gemini Falls Short

Categories

How the Attack Works

The Known-But-Unresolved Timeline

The Contradiction Problem

This Isn’t Just Claude

What Developers Should Know

Share

You may also like

Leave a reply Cancel reply

More in:News

Categories

Latest Posts