Anthropic Launches New AI Safety Protocol After Internal Review Finds Potential Risks In Advanced Model Deployment
SAN FRANCISCO, CA — Artificial intelligence company Anthropic has announced a sweeping update to its internal safety guidelines, following a comprehensive review that identified potential vulnerabilities in the deployment of its latest large language model. According to a company statement issued Wednesday, the new protocol mandates enhanced testing procedures, stricter access controls, and additional independent oversight for all future model releases exceeding a specific computational threshold.
Anthropic’s internal evaluation, conducted over the past three months, focused on risks related to autonomous replication and malicious use. The review involved a team of 50 researchers and ethicists, who assessed more than 1,000 simulated attack scenarios. Results indicated that, under specific conditions, the model could generate plausible instructions for bypassing existing safeguards. Company CEO Dario Amodei stated that the new measures are designed to preempt such vulnerabilities before public deployment.
The update includes a mandatory 90-day safety audit by an external ethics board before any model with advanced capabilities can be released. Additionally, all training datasets will now require a full provenance review. This move comes amid growing federal scrutiny of the AI industry, with the White House recently announcing a new task force to investigate safety standards. Anthropic’s stock saw a 2.3% increase in after-hours trading following the announcement.