Anthropic overhauled its Responsible Scaling Policy and scrapped its 2023 pledge that it would not train an AI system unless it could guarantee in advance that its safety measures were adequate. Chief science officer Jared Kaplan said the company concluded it “wouldn't actually help anyone” to stop training models, citing rapid advances and competitors moving ahead.
The updated policy adds commitments to increase transparency about safety risks, including additional disclosures about how Anthropic’s models perform in safety testing. It also commits Anthropic to matching or surpassing competitors’ safety efforts and to publishing “Frontier Safety Roadmaps” describing planned safety measures, alongside “Risk Reports” every three to six months assessing capabilities, threat models, mitigations, and overall risk.
The revised policy promises to “delay” development only if leaders both believe Anthropic is leading the AI race and view catastrophic risk as significant. The policy was approved unanimously by CEO Dario Amodei and Anthropic’s board, after months of internal discussion.