Anthropic Cache TTL Downgrade: Silent $2.5K Cost Spike

On March 6, 2026, Anthropic silently changed Claude’s prompt cache TTL from 1 hour to 5 minutes without any announcement. Developers only discovered the downgrade when their Claude Code Max quotas—supposed to last 5 hours—exhausted in 19 minutes. Analysis of 119,866 API calls shows the change caused 17-32% cost inflation, with one developer documenting $2,530 in surprise overpayments. No blog post, no email, no changelog. Just higher bills and broken budgets.

5-Minute TTL Turns Breaks Into Bills

The technical change is straightforward but devastating. Anthropic’s prompt caching lets developers store context (project files, documentation, conversation history) on the API side to avoid re-transmitting it with every request. The cache has two write tiers: 5-minute TTL at $3.75/MTok and 1-hour TTL at $6.00/MTok. Cache reads cost just $0.30/MTok—12.5x cheaper than writes.

Before March 6, Claude Code defaulted to 1-hour TTL. Developers paid the higher write cost once per hour, then enjoyed cheap reads for subsequent requests. After March 6, the default shifted to 5-minute TTL. Any pause longer than 5 minutes—a meeting, a debugging session, a coffee break—forces complete cache expiration. The next request must re-create the entire cache at write rates instead of reading it cheaply.

Real coding sessions have natural pauses. February data showed 1-hour TTL generated just 1.1% overhead above baseline costs—nearly optimal efficiency. March data showed 5-minute TTL generated 25.9% overhead. That’s not optimization. That’s a 92% service downgrade disguised as technical adjustment.

The $2,500 Surprise

The financial impact isn’t theoretical. One developer’s analysis of 119,866 API calls from January through April 2026 documents $2,530.88 in total overpayments: $949.08 for Sonnet 4.6, $1,581.80 for Opus 4.6. Both models showed 17.1% waste above expected costs based on February’s 1-hour TTL baseline.

February’s numbers prove the point. With 1-hour TTL dominant for 33 consecutive days, actual costs matched expected costs almost perfectly: just 1.1% overhead. March brought the TTL change, and overhead spiked to 25.9%. The developer expected to pay $4,612.09 for Sonnet based on February patterns. They actually paid $5,561.17—a $949.08 surprise with zero advance notice.

Claude Code Max subscribers saw quota exhaustion accelerate from expected 5-hour sessions to actual 19-minute burnouts. GitHub issue #46829 captured the discovery pattern: “I thought I had a bug.” It wasn’t a bug. It was Anthropic changing the economics of their service without telling anyone.

One-Shot Requests? The Data Says Otherwise

Anthropic’s Jarred Sumner defended the change in GitHub issue #46829, claiming “The March 6 change makes Claude Code cheaper, not more expensive.” The reasoning: many requests are “one-shot calls where the cached context is used once and not revisited,” so 1-hour TTL wastes money on expensive writes that never get read. Since 1-hour writes cost 2x base input versus 5-minute writes at 1.25x, switching to 5-minute TTL saves money on one-shot patterns.

However, the user data contradicts this defense. February showed 1.1% overhead with 1-hour TTL—nearly perfect efficiency. If most requests were truly one-shot, February would show massive waste from expensive writes that never got read. Instead, it showed the opposite: developers were getting value from 1-hour caching. March’s 25.9% overhead proves 5-minute TTL is more expensive for real usage patterns, not cheaper.

Moreover, Anthropic’s response shifts blame to users: “You’re using it wrong” instead of “We changed things without notice.” Real coding sessions are multi-turn with natural pauses. Penalizing normal human behavior—taking breaks to think, debug, or meet—isn’t optimization. It’s choosing provider economics over customer economics.

Related: FinOps for AI 2026: 98% Adoption, $100B Waste Crisis

It’s Not Just Anthropic

This pattern extends across the AI API industry. X (formerly Twitter) killed free API access overnight in February 2023, destroying third-party apps built on that tier. Amazon introduced $1,400/year SP-API subscription fees with just three months’ notice in November 2025. OpenAI has a history of undocumented rate limit changes that developers discover through production failures.

The common thread: providers treat B2B developer APIs like consumer apps where terms change freely. Standard SaaS practice requires 30-90 day notice for pricing changes, public changelogs for API modifications, and migration paths for affected customers. AI providers skip all of that, then frame downgrades as “optimizations” when users complain.

The February data from Claude Code proves this framing is gaslighting. When users can document 17% cost increases with hard numbers, calling it “cheaper” doesn’t fly. The Register broke the cache TTL story, not Anthropic. That’s backwards for a paid B2B service.

What Needs to Change

AI providers must adopt basic SaaS transparency standards. Give 30-90 days notice before implementing pricing-impacting changes. Publish changelogs for API modifications the same way GitHub, Stripe, and every other developer-focused platform does. Add dashboard visibility for cache analytics—let developers see what TTL their requests actually use, what their cache hit rates are, what their true costs break down to.

Most importantly, stop treating pricing changes like internal technical tweaks. A 92% reduction in cache TTL that causes measurable cost increases for paying customers deserves announcement, explanation, and transition time. Developers build businesses on these APIs. Teams create budgets based on observed costs. Silent changes break trust.

Trust once broken is hard to rebuild. Developers have alternatives: OpenAI, Gemini, self-hosted LLMs. If AI providers don’t adopt transparency standards, they’ll face defensive budgeting (30% buffers for surprise changes) and accelerated churn. The competitive pressure could force change—or developers will vote with their wallets. Either way, silent downgrades need to stop.

Key Takeaways

Anthropic changed cache TTL from 1 hour to 5 minutes on March 6, 2026, with zero public announcement—developers discovered through quota exhaustion and cost spikes
The change caused documented 17-32% cost increases, with one analysis showing $2,530 in overpayments across 119,866 API calls over four months
Anthropic’s defense that 5-minute TTL is “cheaper” contradicts user data: February (1h TTL) showed 1.1% overhead, March (5m TTL) showed 25.9% overhead
This pattern extends across AI providers (X, Amazon, OpenAI) who make silent economic changes without notice, treating B2B developers like consumer app users
AI providers must adopt SaaS transparency standards: 30-90 day notice for pricing changes, public changelogs, dashboard visibility, and migration paths for paying customers

ByteBot

I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

Anthropic Cache TTL Downgrade: Silent $2.5K Cost Spike

5-Minute TTL Turns Breaks Into Bills

The $2,500 Surprise

One-Shot Requests? The Data Says Otherwise

It’s Not Just Anthropic

What Needs to Change

Key Takeaways

Docker Fails in Spain: La Liga Blocks Cloudflare IPs

GitHub Copilot’s Trust Crisis: Ads, Data Grabs, Revolt

Leave a reply Cancel reply

Categories

5-Minute TTL Turns Breaks Into Bills

The $2,500 Surprise

One-Shot Requests? The Data Says Otherwise

It’s Not Just Anthropic

What Needs to Change

Key Takeaways

Share

You may also like

Leave a reply Cancel reply

Categories

Latest Posts