Claude's Cache Problem: Why AI Coding Sessions Are Burning Through Quotas Faster Than Ever
Anthropic's decision to reduce Claude's prompt cache duration from one hour to five minutes has triggered a wave of complaints from developers who say their usage quotas are depleting far faster than before, despite the company's assurances that costs should remain stable. The technical tweak, which took effect in early March, has left even long-term subscribers struggling to understand why their AI coding sessions now consume tokens at an alarming rate .
What Changed With Claude's Cache System?
To understand the frustration, it helps to know how Claude's prompt caching works. When developers use Claude Code, they send additional context along with their requests, such as existing code files or background instructions. Rather than reprocessing this context every time, Claude caches it to save computing resources and costs. The cache can operate with either a five-minute or one-hour time-to-live (TTL), which determines how long the cached data remains available before expiring .
Writing to the five-minute cache costs 25 percent more in tokens, while writing to the one-hour cache costs 100 percent more. However, reading from cache is dramatically cheaper, costing around 10 percent of the base price. Anthropic reduced the TTL back to five minutes around March 7, after briefly offering the one-hour option starting February 1 .
Why Are Developers Running Out of Quota So Quickly?
Sean Swanson, a Claude Code user and long-term subscriber, documented the issue in a detailed bug report. He noted that the five-minute cache TTL is "disproportionately punishing for the long-session, high-context use case that defines Claude Code usage." Swanson had been a $200 per month subscriber for over six months without hitting quota limits, but after the March change, he began exhausting his allowance far faster .
The problem intensifies when developers use Claude's larger context windows. Paid plans now offer access to Claude Opus 4.6 and Sonnet 4.6 models with a one-million-token context window, roughly equivalent to processing 750,000 words at once. When the cache expires after five minutes and developers continue a session, they face a full cache miss, meaning the system must reprocess all that context from scratch, consuming significantly more tokens .
"Prompt cache misses when using 1M token context window are expensive. If you leave your computer for over an hour then continue a stale session, it's often a full cache miss," said Boris Cherny, creator of Claude Code.
Boris Cherny, Creator of Claude Code
Cherny also noted that larger contexts have become standard practice because developers are now "pulling in a large number of skills, or running many agents or background automations" .
What Does Anthropic Say About the Changes?
Jarred Sumner, creator of the Bun JavaScript runtime who now works at Anthropic, acknowledged the analysis as "good detective work" but argued that the five-minute cache actually makes Claude Code cheaper overall. His reasoning: many Claude Code requests are one-shot calls where cached context is used once and never revisited, so the lower write cost of the five-minute cache benefits these quick interactions .
However, Sumner's explanation doesn't address the core complaint from long-session users. Swanson revised his analysis to acknowledge that subagents do benefit from faster interactions and lower write costs, but emphasized that his own experience contradicts Anthropic's claims. "The 'extra burn rate' is 'making a once great service unusable,'" he stated .
How to Manage Claude Code Usage and Cache Efficiency
- Monitor Session Duration: Keep individual coding sessions shorter to avoid cache expiration. If you step away from your computer for more than five minutes, expect the cache to reset when you return, triggering expensive reprocessing of your context.
- Batch Your Requests: Group multiple related coding tasks into a single session rather than spreading them across separate interactions, which helps maintain cache continuity and reduces token consumption.
- Optimize Context Size: Anthropic is investigating a 400,000-token default context window with an optional upgrade to one million tokens. Consider using the smaller default if you're experiencing quota burnout, as it reduces cache miss costs.
- Track Your Quota Usage: Monitor how quickly your monthly allowance depletes, especially after the March changes. Pro users ($20 per month) have reported exhausting quotas in as few as five hours, so awareness is critical.
Cherny confirmed that Anthropic is actively investigating the context window issue and considering a 400,000-token default setting as a middle ground, with a configuration option available for users who need the full one-million-token window .
Are There Other Performance Issues Beyond Caching?
The quota complaints may not tell the whole story. Multiple developers have reported that Claude's overall performance has declined since late March. One enterprise team plan subscriber noted that sessions are now "getting stuck in overthinking loops, multiple turns of realising the same thing, dozens of paragraphs of 'but wait, actually I need to do x' with slight variations." This mirrors complaints from an AI director at AMD, who has criticized Claude Code for becoming "dumber and lazier" since recent updates .
Some developers suspect that cache rebuilding and cache misses are major culprits, though one user cautioned that "before those are fixed likely any 5 minutes vs 1 hour discussion is entirely moot since numbers are totally flawed." The focus on cache optimization may also suggest that, beneath the surface, Anthropic's quotas are simply buying less processing time than they did previously .
What's Next for Claude?
Anthropic is preparing broader updates to its Claude ecosystem. Reports indicate that Claude Opus 4.7 is nearing release, following performance adjustments to its predecessor, Claude Opus 4.6. The company is also developing a full-stack app creation platform similar to Google AI Studio, designed to simplify AI application development workflows. Additionally, Claude Code is receiving a unified interface for managing multiple projects, and Anthropic has initiated beta testing for Claude integration into Microsoft Word .
For now, developers relying on Claude Code for intensive, long-session work face a difficult choice: adapt their workflows to the new cache constraints, upgrade to higher-tier subscriptions, or explore alternative AI coding assistants. Anthropic's investigation into context window defaults and cache optimization may eventually resolve these issues, but the current situation has clearly frustrated a segment of power users who built their workflows around Claude's previous capabilities.
" }