
Anthropic’s Claude Fable 5 is back on the API as of July 1 — restored globally after a brief export-control suspension — and the benchmark story is genuinely impressive: 95% on SWE-bench Verified, 80.3% on the harder SWE-bench Pro, no close competitor within 6 points. If you write code for a living, it is the most capable model you can call today. The announcements have glossed over three things that matter significantly to developers integrating it in production: a thinking mode you cannot turn off, a safety classifier that silently swaps out your model mid-session, and a data retention requirement that voids your zero-data-retention agreement. Here is what you need to know before you swap in claude-fable-5.
What Fable 5 Is
Fable 5 shares its underlying weights with Claude Mythos 5 — Anthropic’s restricted-access model for vetted cybersecurity and government use — but ships with safety classifiers enabled for the general API. It supports a 1M-token context window by default (no opt-in, no premium pricing) and can generate up to 128k output tokens per request. Pricing is $10 per million input tokens and $50 per million output tokens, with a 90% discount on prompt cache reads. That is exactly 2x Opus 4.8 across the board.
It is available today on the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Azure AI Foundry. The model string is claude-fable-5.
| Model | SWE-bench Verified | SWE-bench Pro |
|---|---|---|
| Claude Fable 5 | 95.0% | 80.3% |
| Claude Opus 4.8 | 88.6% | 69.2% |
| GPT-5.5 | 82.6% | — |
| Gemini 3.5 Flash | 78.8% | — |
Breaking Change 1: Thinking Mode Is Always On
Fable 5’s adaptive thinking cannot be disabled. Passing thinking={"type": "disabled"} returns an HTTP 400 error. This catches Opus 4.8 users immediately: if your integration explicitly disabled extended thinking, that parameter needs to go before you ship.
The subtler problem is prompt contamination. Fable 5’s safety classifier targets reasoning-extraction attempts — prompts that include phrases like “show your reasoning step by step,” “explain your thought process before answering,” or “walk me through your logic.” These are standard in Opus 4.8 chain-of-thought prompts and will trigger classifier scrutiny. Audit your system prompts before you migrate.
The cost implication: simple tasks now generate more tokens than they would on a model without forced thinking. Use streaming for anything non-trivial; a complex agentic task on a blocking call can sit silent for 60+ seconds.
Breaking Change 2: The Safety Classifier Routes Your Request Elsewhere
This is the one developers are learning about in production rather than in documentation. When Fable 5’s classifier fires, your request is not rejected — it is silently routed to Opus 4.8, which evaluates and returns a response. You get a notification that a different model handled the request, but your application needs to handle the different response characteristics and potential cost difference.
The three blocked categories: offensive cybersecurity (malware, exploit development, attack tooling), synthesis routes for controlled biological or chemical agents, and model distillation requests. The classifier fires in fewer than 5% of sessions on average — but the retrained cybersecurity classifier added on July 1 redeployment is aggressive. It catches legitimate security-adjacent engineering work: writing test exploits for your own infrastructure, fuzzing systems you own, or analyzing malware samples defensively. Security engineering teams will hit this regularly.
One practical note: requests refused before any output is generated are not billed and do not count against rate limits. The cost risk is from classifier-routed requests that complete via Opus 4.8, which bill at Opus 4.8 rates.
Breaking Change 3: Data Retention — The Enterprise Blocker
Fable 5 mandates 30-day data retention on all requests. There is no ZDR option, no enterprise waiver, and the requirement overrides existing Zero Data Retention data processing agreements. If your organization negotiated a ZDR DPA with Anthropic, using Fable 5 voids it for that traffic — Anthropic confirmed this applies to Bedrock and GCP deployments as well.
Anthropic’s stated purpose is trust and safety review only: no model training, logged human access, deletion after 30 days. For regulated industries, that assurance is not a substitute for contractual ZDR protection. If your compliance posture requires ZDR, stay on Opus 4.8 or Sonnet 5, or restrict Fable 5 calls to data that does not touch regulated content.
How to Migrate
For Opus 4.8 users, migration is close to a one-line change. Fable 5 uses the same Messages API and tool use patterns. The minimum viable migration:
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-fable-5", # was: claude-opus-4-8
max_tokens=4096,
# Do NOT pass thinking={"type": "disabled"} — returns HTTP 400
messages=[{
"role": "user",
"content": "Refactor this module into four testable units. Return a diff."
}]
)
print(message.content[0].text)
Beyond the model string, check for: thinking-disable parameters, reasoning-extraction language in system prompts, and classifier response handling in your application code. The AWS Bedrock team published a migration guide covering platform-specific considerations.
The production pattern that is emerging: Sonnet 5 handles routine execution, Opus 4.8 takes judgment calls and code review, and Fable 5 is reserved for the hardest tasks — large-scale migrations, deep debugging, and decisions where being wrong is expensive. Early data shows Fable 5 completes equivalent agentic tasks in fewer tokens than Opus 4.8, which partially offsets the 2x pricing premium on complex work.
Should You Use It?
Fable 5 is worth it for: complex codebase migrations that need a 1M-token window, multi-day agentic workflows, and hard debugging problems where Opus 4.8 repeatedly falls short. It is not worth it for: high-volume routine coding (too expensive per token), ZDR-required enterprise environments, and security engineering teams who will hit classifier friction on a daily basis.
The benchmark lead is real and significant. The constraints are too. Knowing both before you commit to a migration will save time when you hit them in production. The full technical breakdown from Simon Willison is the most thorough independent analysis available if you want to go deeper.













