Anthropic overhauled its Responsible Scaling Policy and scrapped its 2023 pledge that it would not train an AI system unless it could guarantee in advance that its safety measures were adequate. Chief science officer Jared Kaplan said the company concluded it âwouldn't actually help anyoneâ to stop training models, citing rapid advances and competitors moving ahead.
The updated policy adds commitments to increase transparency about safety risks, including additional disclosures about how Anthropicâs models perform in safety testing. It also commits Anthropic to matching or surpassing competitorsâ safety efforts and to publishing âFrontier Safety Roadmapsâ describing planned safety measures, alongside âRisk Reportsâ every three to six months assessing capabilities, threat models, mitigations, and overall risk.
The revised policy promises to âdelayâ development only if leaders both believe Anthropic is leading the AI race and view catastrophic risk as significant. The policy was approved unanimously by CEO Dario Amodei and Anthropicâs board, after months of internal discussion.