
An unnamed enterprise accidentally spent $500 million on Anthropic’s Claude in a single month after deploying AI access to employees with no usage caps. The story broke on May 28. It was barely a surprise to anyone paying attention. Uber had already admitted burning through its entire 2026 AI budget in four months—roughly 5,000 engineers on Claude Code, with heavy users racking up $500 to $2,000 per month individually. Meanwhile, Microsoft quietly began canceling most internal Claude Code licenses in its Experiences + Devices division, with a June 30 cutoff cited internally. Goldman Sachs published a report estimating AI agents could multiply enterprise token demand 24 times by 2030. The math is already not working for companies that skipped the part about cost controls.
The SaaS Mindset Meets Token Billing
Most enterprises approached AI tools the same way they buy Slack or Jira: a flat monthly seat cost, predictable, easy to forecast. That model does not apply to token-based AI billing. A single coding session with a large context window plus tool calls can cost $50 to $200. That is not an edge case—that is a capable developer doing two or three hours of real work with Claude Code. Multiply by 5,000 engineers and you have Uber’s situation. The companies that got burned were not reckless; they were applying a procurement framework designed for a different pricing model.
What Actually Causes a $500M Bill
Individual usage is not the driver. The driver is recursive agent loops. The pattern: Agent A generates a plan. Agent B reviews it. Agent C revises the plan. Agent D reviews the revision. Each step ingests the full context window. At $15 per million tokens for Claude Opus 4.8, a single 1M-token context read costs $15. A background monitoring agent reading that context every five minutes runs $180 per hour. Fifty engineers with background agents running in parallel: $9,000 per hour, $216,000 per day—before a single line of code is committed. These loops do not just waste money. They produce bloated codebases filled with redundant boilerplate that nobody can confidently own six months later.
Goldman Sachs quantified the trajectory: AI agents could drive token consumption from roughly 5 quadrillion tokens per month today to 120 quadrillion by 2030. A single programming agent consumes approximately 7 million tokens per day. The firm’s thesis assumes enterprises build cost controls to capture the value. Without them, even falling token prices cannot offset volume growing faster than costs decline.
Five Guardrails You Can Ship This Week
None of these require waiting for a vendor fix. All of them are available now.
- Cap retries and max_tokens per agent call. Set
max_tokenson every LLM call and limit retries to three. Most cost overruns compound through uncapped retry loops, not individual sessions. - Tier your models. Use Claude Haiku for review, classification, and summarization passes. Reserve Sonnet or Opus for generation. A review pass that costs $0.003 per call instead of $0.015 compounds to significant savings at scale.
- Set per-user daily spending alerts at $50. That is a reasonable ceiling for a productive day. If a user exceeds it, something automated is probably running unsupervised.
- Tag every API call. Use the Anthropic Usage and Cost API to tag calls with team, project, and task identifiers. Without attribution, you cannot audit and you cannot improve.
- Add a human checkpoint for operations above $1. Pre-flight cost estimation is not built into Anthropic’s tooling yet, but you can estimate: count context tokens before sending, multiply by the model rate, and surface a confirmation prompt above your threshold.
What Anthropic Provides and Where the Gaps Are
Anthropic does offer organizational and per-user spending limits in the admin panel, plus the Usage and Cost API for programmatic access to token counts per request and per user. The new Compliance API gives real-time usage data for governance teams. Claude Code analytics show acceptance rates and lines of code per developer. These are real controls that most affected companies were not using.
The gaps are equally real. There is no auto-pause when a budget threshold is hit. There is no pre-flight cost estimation before a multi-step task executes. There is no built-in agent loop detection. The $500M incident happened on a platform that had controls available—the company simply did not configure them.
This Is a Developer Responsibility
Uber’s COO told Fortune he finds it “difficult to connect the increased use of AI tools to new consumer features.” That matters more than the cost figures. The ROI measurement failure is as serious as the billing failure. Developers ship agents. Developers own the cost model of what they ship. The discipline of estimating, capping, attributing, and monitoring token costs is now as foundational as writing tests or monitoring error rates. The companies pulling back on AI tools are not doing so because AI does not work—they are doing so because they shipped agents the same way they shipped batch jobs in 2018, with no visibility into cost until the invoice arrived.
The Goldman Sachs projection of 24x token demand growth assumes the value gets captured. It does not get captured automatically. It gets captured by teams that built cost controls before deploying agents at scale, not after reading a $500 million invoice.













