On February 24, 2026, Anthropic—the AI company founded by former OpenAI researchers explicitly to prioritize safety over speed—dropped its flagship safety commitment. The company’s Responsible Scaling Policy (RSP) no longer includes the hard pledge to pause AI development if safety measures can’t be guaranteed in advance. Instead, Anthropic now offers “non-binding safety roadmaps” and periodic risk reports. The timing is striking: just nine days after the Pentagon threatened to cancel Anthropic’s $200 million contract over military AI restrictions.
This represents the moment AI safety commitments died, not from malice, but from market forces. If the company founded specifically to avoid the AI race now admits racing is unavoidable, voluntary safety commitments don’t work.
From Hard Limits to Soft Goals
Anthropic’s original 2023 commitment was a hard pledge: never train an AI system without guaranteed safety measures in place. RSP version 3.0, released February 24, replaces this with two tiers: realistic unilateral commitments Anthropic will pursue “regardless of what others do,” and aspirational industry-wide recommendations requiring coordination. The mandatory “pause if unsafe” tripwire is gone.
Development can now continue while implementing safety measures retroactively. The old policy was preventive—stop if you can’t prove it’s safe. The new policy is transparent—keep building, publish reports, grade yourself publicly. It’s accountability through disclosure rather than prevention through mandatory pauses.
What was removed: automatic capability thresholds triggering mandatory pauses. What remains: transparency through public reports every 3-6 months and external expert review. The shift is fundamental—from “we won’t build it if we can’t guarantee safety” to “we’ll build it and tell you how safe we think it is.”
The Prisoner’s Dilemma of AI Safety
Anthropic’s rationale reveals the core game theory problem. Jared Kaplan, Anthropic’s Chief Science Officer, stated it directly: “doesn’t make sense for us to make unilateral commitments if competitors are blazing ahead.” The company founded to prevent the AI race found itself running in one—and losing means irrelevance, not just slower growth.
The prisoner’s dilemma is simple: if both AI companies pause for safety, the world is safer. But if one company pauses while the other races ahead, the responsible company dies and the reckless one shapes AI’s future. The Nash equilibrium is both companies racing, even though mutual cooperation would be better for everyone. Anthropic just admitted this publicly.
Three forces made the original policy unsustainable: competitive pressure from OpenAI and Google, regulatory stagnation as governments prioritize competitiveness over safety, and the impossibility of meeting higher safety levels without industry-wide coordination. Voluntary commitments fail when being responsible is a competitive disadvantage.
The Company Built to Avoid This
The irony is devastating. Anthropic was founded in 2021 by Dario and Daniela Amodei—former OpenAI VPs—specifically because they were concerned about OpenAI’s direction and the risks of an unconstrained AI race. The company’s founding thesis was “we’ll be the responsible ones who pause if needed.” Five years later, they’ve come full circle.
In 2020-2021, Dario Amodei and 14 other OpenAI researchers left over safety concerns, fearing “industrial capture” as Microsoft deepened its partnership with OpenAI. They founded Anthropic as a public benefit corporation, structured specifically to prioritize mission over profits. In 2023, they introduced the RSP with hard pause commitments as their core differentiator. In 2026, competitive pressure forced them to abandon exactly what they were founded to protect.
As Decrypt reported, the pattern is industry-wide: OpenAI removed “safely” from its mission statement in 2024, Google DeepMind offers “responsible AI” principles without hard limits, and Meta open-sourced models despite safety concerns. All major labs are converging toward minimal voluntary commitments under competitive pressure. The company created to avoid the race admits racing is unavoidable.
The $200M Question
Nine days before Anthropic released RSP v3.0, the Pentagon gave the company a Friday ultimatum: lift restrictions on Claude’s military use or face contract cancellation. Secretary Pete Hegseth specifically demanded Claude be allowed for AI-controlled weapons and mass surveillance. Anthropic’s $200 million contract was at stake, along with threats of “supply chain risk” designation that would limit the company’s ability to work with any defense vendors.
Anthropic officially denies the policy change is connected to Pentagon pressure. But the timing raises questions about how government contracts, competitive dynamics, and billions in funding create overwhelming pressure to weaken safety commitments. Even if the RSP change isn’t directly caused by the Pentagon ultimatum, the situation illustrates the environment voluntary commitments must survive in—and they can’t.
For Developers and the Industry
For AI developers and tech professionals, this changes the trust calculation. “Working at the safety-focused company” is no longer meaningful differentiation. Corporate safety commitments from any lab should be viewed with skepticism as market pressures systematically erode them. OpenAI, Google, Anthropic—all are converging toward the same competitive equilibrium.
The lesson: evaluate AI labs on actual practices, research output, and transparency—not marketing claims about “responsible AI.” AI safety research may be more impactful at nonprofits and academia than at competitive labs racing for market dominance. And for policymakers, this is definitive proof voluntary commitments don’t work. The prisoner’s dilemma requires regulatory solutions, not corporate promises.
As TIME reported, Chris Painter of METR warned of a “frog-boiling” effect where dangers incrementally escalate without triggering clear thresholds. Anthropic’s policy change acknowledges this reality: in competitive markets, safety commitments collapse. Individual developer ethics matter more as corporate frameworks fail.
Key Takeaways
- Anthropic abandoned its hard commitment to pause AI development if safety can’t be guaranteed, replacing it with non-binding roadmaps and periodic risk reports—the company founded on safety-first principles admits market forces override voluntary commitments
- The prisoner’s dilemma is real: pausing for safety while competitors race ahead means commercial death, creating a Nash equilibrium where all companies must race even though cooperation would be safer
- Corporate safety commitments across all major AI labs are collapsing under competitive pressure—OpenAI removed “safely” from its mission, Anthropic weakened its RSP, and the pattern is industry-wide convergence toward minimal voluntary limits
- For developers, “safety-focused” companies aren’t meaningfully different anymore—evaluate labs on actual research output and transparency, not marketing, and recognize individual ethics matter more as corporate frameworks fail
- This proves voluntary AI safety commitments don’t work in competitive markets—the collective action problem requires regulatory solutions, not corporate promises that systematically erode under market pressure






