On March 6, 2026, Anthropic silently changed Claude’s prompt cache TTL from 1 hour to 5 minutes without any announcement. Developers only discovered the downgrade when their Claude Code Max quotas—supposed to last 5 hours—exhausted in 19 minutes. Analysis of 119,866 API calls shows the change caused 17-32% cost inflation, with one developer documenting $2,530 in surprise overpayments. No blog post, no email, no changelog. Just higher bills and broken budgets.
5-Minute TTL Turns Breaks Into Bills
The technical change is straightforward but devastating. Anthropic’s prompt caching lets developers store context (project files, documentation, conversation history) on the API side to avoid re-transmitting it with every request. The cache has two write tiers: 5-minute TTL at $3.75/MTok and 1-hour TTL at $6.00/MTok. Cache reads cost just $0.30/MTok—12.5x cheaper than writes.
Before March 6, Claude Code defaulted to 1-hour TTL. Developers paid the higher write cost once per hour, then enjoyed cheap reads for subsequent requests. After March 6, the default shifted to 5-minute TTL. Any pause longer than 5 minutes—a meeting, a debugging session, a coffee break—forces complete cache expiration. The next request must re-create the entire cache at write rates instead of reading it cheaply.
Real coding sessions have natural pauses. February data showed 1-hour TTL generated just 1.1% overhead above baseline costs—nearly optimal efficiency. March data showed 5-minute TTL generated 25.9% overhead. That’s not optimization. That’s a 92% service downgrade disguised as technical adjustment.
The $2,500 Surprise
The financial impact isn’t theoretical. One developer’s analysis of 119,866 API calls from January through April 2026 documents $2,530.88 in total overpayments: $949.08 for Sonnet 4.6, $1,581.80 for Opus 4.6. Both models showed 17.1% waste above expected costs based on February’s 1-hour TTL baseline.
February’s numbers prove the point. With 1-hour TTL dominant for 33 consecutive days, actual costs matched expected costs almost perfectly: just 1.1% overhead. March brought the TTL change, and overhead spiked to 25.9%. The developer expected to pay $4,612.09 for Sonnet based on February patterns. They actually paid $5,561.17—a $949.08 surprise with zero advance notice.
Claude Code Max subscribers saw quota exhaustion accelerate from expected 5-hour sessions to actual 19-minute burnouts. GitHub issue #46829 captured the discovery pattern: “I thought I had a bug.” It wasn’t a bug. It was Anthropic changing the economics of their service without telling anyone.
One-Shot Requests? The Data Says Otherwise
Anthropic’s Jarred Sumner defended the change in GitHub issue #46829, claiming “The March 6 change makes Claude Code cheaper, not more expensive.” The reasoning: many requests are “one-shot calls where the cached context is used once and not revisited,” so 1-hour TTL wastes money on expensive writes that never get read. Since 1-hour writes cost 2x base input versus 5-minute writes at 1.25x, switching to 5-minute TTL saves money on one-shot patterns.
However, the user data contradicts this defense. February showed 1.1% overhead with 1-hour TTL—nearly perfect efficiency. If most requests were truly one-shot, February would show massive waste from expensive writes that never got read. Instead, it showed the opposite: developers were getting value from 1-hour caching. March’s 25.9% overhead proves 5-minute TTL is more expensive for real usage patterns, not cheaper.
Moreover, Anthropic’s response shifts blame to users: “You’re using it wrong” instead of “We changed things without notice.” Real coding sessions are multi-turn with natural pauses. Penalizing normal human behavior—taking breaks to think, debug, or meet—isn’t optimization. It’s choosing provider economics over customer economics.
Related: FinOps for AI 2026: 98% Adoption, $100B Waste Crisis
It’s Not Just Anthropic
This pattern extends across the AI API industry. X (formerly Twitter) killed free API access overnight in February 2023, destroying third-party apps built on that tier. Amazon introduced $1,400/year SP-API subscription fees with just three months’ notice in November 2025. OpenAI has a history of undocumented rate limit changes that developers discover through production failures.
The common thread: providers treat B2B developer APIs like consumer apps where terms change freely. Standard SaaS practice requires 30-90 day notice for pricing changes, public changelogs for API modifications, and migration paths for affected customers. AI providers skip all of that, then frame downgrades as “optimizations” when users complain.
The February data from Claude Code proves this framing is gaslighting. When users can document 17% cost increases with hard numbers, calling it “cheaper” doesn’t fly. The Register broke the cache TTL story, not Anthropic. That’s backwards for a paid B2B service.
What Needs to Change
AI providers must adopt basic SaaS transparency standards. Give 30-90 days notice before implementing pricing-impacting changes. Publish changelogs for API modifications the same way GitHub, Stripe, and every other developer-focused platform does. Add dashboard visibility for cache analytics—let developers see what TTL their requests actually use, what their cache hit rates are, what their true costs break down to.
Most importantly, stop treating pricing changes like internal technical tweaks. A 92% reduction in cache TTL that causes measurable cost increases for paying customers deserves announcement, explanation, and transition time. Developers build businesses on these APIs. Teams create budgets based on observed costs. Silent changes break trust.
Trust once broken is hard to rebuild. Developers have alternatives: OpenAI, Gemini, self-hosted LLMs. If AI providers don’t adopt transparency standards, they’ll face defensive budgeting (30% buffers for surprise changes) and accelerated churn. The competitive pressure could force change—or developers will vote with their wallets. Either way, silent downgrades need to stop.
Key Takeaways
- Anthropic changed cache TTL from 1 hour to 5 minutes on March 6, 2026, with zero public announcement—developers discovered through quota exhaustion and cost spikes
- The change caused documented 17-32% cost increases, with one analysis showing $2,530 in overpayments across 119,866 API calls over four months
- Anthropic’s defense that 5-minute TTL is “cheaper” contradicts user data: February (1h TTL) showed 1.1% overhead, March (5m TTL) showed 25.9% overhead
- This pattern extends across AI providers (X, Amazon, OpenAI) who make silent economic changes without notice, treating B2B developers like consumer app users
- AI providers must adopt SaaS transparency standards: 30-90 day notice for pricing changes, public changelogs, dashboard visibility, and migration paths for paying customers

