Anthropic Drops Flagship Safety Pledge
Anthropic’s flagship “we will pause without proven mitigations” pledge is gone, replacing a hard stop with board discretion and competitor-matching in a race that rewards speed over safety.
Feb 24, 2026
Sources
Summary
Anthropic has removed its 2023 commitment to not train or release advanced AI models unless it could guarantee in advance that its safety measures were adequate.
The company replaced a categorical internal constraint with a competitor-relative standard, periodic risk disclosures, and a conditional promise to delay development only under specific leadership-and-catastrophe-risk conditions.
In practice, frontier AI development can proceed without a pre-verified safety backstop, shifting public reliance to voluntary reporting and board-approved judgment calls.
Reality Check
Voluntary self-regulation collapsing under competitive pressure is how high-risk industries drift from enforceable guardrails into after-the-fact damage control, and our safety becomes contingent on private incentives we cannot vote on or audit in real time. Nothing described here is likely criminal on its face—rewriting an internal policy and choosing to continue training is generally lawful absent fraud or a specific statutory duty—so the core problem is governance: a categorical public-facing constraint has been swapped for discretionary thresholds and competitor-relative promises. When “pause unless safe” becomes “go unless we’re ahead and catastrophe seems significant,” the precedent is clear: the market sets the pace, and accountability shifts from rules to narrative-managed reporting.
Detail
<p>Anthropic overhauled its Responsible Scaling Policy and scrapped its 2023 pledge that it would not train an AI system unless it could guarantee in advance that its safety measures were adequate. Chief science officer Jared Kaplan said the company concluded it “wouldn't actually help anyone” to stop training models, citing rapid advances and competitors moving ahead.</p><p>The updated policy adds commitments to increase transparency about safety risks, including additional disclosures about how Anthropic’s models perform in safety testing. It also commits Anthropic to matching or surpassing competitors’ safety efforts and to publishing “Frontier Safety Roadmaps” describing planned safety measures, alongside “Risk Reports” every three to six months assessing capabilities, threat models, mitigations, and overall risk.</p><p>The revised policy promises to “delay” development only if leaders both believe Anthropic is leading the AI race and view catastrophic risk as significant. The policy was approved unanimously by CEO Dario Amodei and Anthropic’s board, after months of internal discussion.</p>