Anthropic Open-Sources Its AI Safety Testing Tool, Handing Control to Independent Nonprofit
Anthropic has handed over development of Petri, its open-source AI alignment testing tool, to nonprofit Meridian Labs, marking a significant shift in how the AI industry approaches independent safety evaluation. The move mirrors Anthropic's earlier donation of the Model Context Protocol (MCP) to the Linux Foundation and comes alongside the release of Petri 3.0, the tool's largest update since its October 2025 launch.
The decision reflects a growing recognition that AI safety evaluation needs to remain independent from any single company to maintain credibility across the industry. Petri has been central to Anthropic's alignment assessment for every Claude model since Claude Sonnet 4.5, and the UK AI Security Institute (AISI) has built its entire alignment evaluation pipeline on the tool, using it to test frontier models for research-sabotage propensity.
What Makes Petri 3.0 Different From Earlier Versions?
The new version introduces fundamental structural improvements that make the tool more flexible and realistic. The central change splits the auditor model and the target model being tested into separate components that communicate through a defined interface, rather than being tightly coupled as in earlier versions. This allows researchers to customize the target, the auditor, or both without unpicking interleaved code.
A key addition is Dish, a new extension currently in research preview that addresses a persistent problem in alignment evaluation: AI models can often detect they are being tested and may behave differently than they would in real deployment. Dish runs audits inside real agent scaffolds, including Claude Code, Codex, and Gemini CLI, so the target model sees real system prompts and tool definitions rather than synthetic approximations built by the auditor.
Petri 3.0 also integrates with Bloom, Anthropic's open-source framework for generating targeted behavioral evaluations. Where Petri explores broadly across many scenarios, Bloom goes deep on a single behavior, automatically generating evaluation suites that measure how often and how severely it occurs. The two tools can now be composed together, with Bloom using Petri as its backbone for execution.
How to Understand Petri's Role in AI Safety Infrastructure?
- Auditor-Target Separation: The new architecture allows independent researchers to modify either the testing model or the model being tested without affecting the other, enabling more flexible and customized evaluations across different AI systems.
- Real-World Testing Environment: The Dish extension runs evaluations inside actual deployment scaffolds rather than simulated environments, preventing AI models from detecting they are being tested and behaving artificially.
- Behavioral Depth Analysis: Integration with the Bloom framework enables researchers to measure not just whether a behavior occurs, but how often and how severely it manifests across different scenarios.
At Meridian Labs, Petri joins Inspect AI, Inspect Scout, and Inspect Flow in an open-source evaluation stack used by government AI safety institutes in the UK, US, EU, Japan, and Korea, as well as research organizations including METR, Apollo, Epoch, and RAND. The UK AI Security Institute deployed a prototype of Petri 3.0 in its pre-deployment evaluations of Claude Mythos and Opus 4.7, combining scaffold realism with auditor-side codebase grounding.
The move positions independent alignment evaluation as critical infrastructure rather than a side project at any single AI lab. By transferring Petri to Meridian Labs, Anthropic signals that the future of AI safety evaluation depends on tools and processes that remain neutral and credible across the entire industry, not controlled by individual companies with competing interests.
Petri 3.0 is available now as open source, and the tool's transition to a nonprofit steward, combined with its growing adoption by government safety bodies worldwide, reflects a maturing approach to how frontier AI models are tested before deployment.