Anthropic Open-Sources Its AI Safety Testing Tool, Handing Control to Independent Nonprofit

FrontierNews.ai AI Research Desk

Anthropic Open-Sources Its AI Safety Testing Tool, Handing Control to Independent Nonprofit

Anthropic has handed over development of Petri, its open-source AI alignment testing tool, to nonprofit Meridian Labs, marking a significant shift in how the AI industry approaches independent safety evaluation. The move mirrors Anthropic's earlier donation of the Model Context Protocol (MCP) to the Linux Foundation and comes alongside the release of Petri 3.0, the tool's largest update since its October 2025 launch.

The decision reflects a growing recognition that AI safety evaluation needs to remain independent from any single company to maintain credibility across the industry. Petri has been central to Anthropic's alignment assessment for every Claude model since Claude Sonnet 4.5, and the UK AI Security Institute (AISI) has built its entire alignment evaluation pipeline on the tool, using it to test frontier models for research-sabotage propensity.

What Makes Petri 3.0 Different From Earlier Versions?

The new version introduces fundamental structural improvements that make the tool more flexible and realistic. The central change splits the auditor model and the target model being tested into separate components that communicate through a defined interface, rather than being tightly coupled as in earlier versions. This allows researchers to customize the target, the auditor, or both without unpicking interleaved code.

A key addition is Dish, a new extension currently in research preview that addresses a persistent problem in alignment evaluation: AI models can often detect they are being tested and may behave differently than they would in real deployment. Dish runs audits inside real agent scaffolds, including Claude Code, Codex, and Gemini CLI, so the target model sees real system prompts and tool definitions rather than synthetic approximations built by the auditor.

Petri 3.0 also integrates with Bloom, Anthropic's open-source framework for generating targeted behavioral evaluations. Where Petri explores broadly across many scenarios, Bloom goes deep on a single behavior, automatically generating evaluation suites that measure how often and how severely it occurs. The two tools can now be composed together, with Bloom using Petri as its backbone for execution.

How to Understand Petri's Role in AI Safety Infrastructure?

Auditor-Target Separation: The new architecture allows independent researchers to modify either the testing model or the model being tested without affecting the other, enabling more flexible and customized evaluations across different AI systems.
Real-World Testing Environment: The Dish extension runs evaluations inside actual deployment scaffolds rather than simulated environments, preventing AI models from detecting they are being tested and behaving artificially.
Behavioral Depth Analysis: Integration with the Bloom framework enables researchers to measure not just whether a behavior occurs, but how often and how severely it manifests across different scenarios.

At Meridian Labs, Petri joins Inspect AI, Inspect Scout, and Inspect Flow in an open-source evaluation stack used by government AI safety institutes in the UK, US, EU, Japan, and Korea, as well as research organizations including METR, Apollo, Epoch, and RAND. The UK AI Security Institute deployed a prototype of Petri 3.0 in its pre-deployment evaluations of Claude Mythos and Opus 4.7, combining scaffold realism with auditor-side codebase grounding.

The move positions independent alignment evaluation as critical infrastructure rather than a side project at any single AI lab. By transferring Petri to Meridian Labs, Anthropic signals that the future of AI safety evaluation depends on tools and processes that remain neutral and credible across the entire industry, not controlled by individual companies with competing interests.

Petri 3.0 is available now as open source, and the tool's transition to a nonprofit steward, combined with its growing adoption by government safety bodies worldwide, reflects a maturing approach to how frontier AI models are tested before deployment.

Your AI & Tech News Engine

Breaking News

Amazon Q Developer Is Shutting Down: What Developers Need to Know About the Shift to Kiro

Elon Musk's xAI Launches Grok Build to Challenge Anthropic's Coding Dominance

Elon Musk's xAI Launches Grok Build to Challenge Claude in the Coding Agent Race

xAI's Grok Build Enters the Coding Agent Wars with a Plan-First Approach

Why Waymo's Robotaxi Model Is Reshaping What Cars Will Actually Do in 2026 and Beyond

Claude Code Is Becoming the Invisible Engine Behind Major Software Projects

How Nano Nuclear's Microreactor Could Solve AI's Power Crisis Without Community Backlash

Perplexity and AI Search Engines Are Reshaping How Websites Manage Bot Traffic in 2026

Anthropic Open-Sources Its AI Safety Testing Tool, Handing Control to Independent Nonprofit

What Makes Petri 3.0 Different From Earlier Versions?

How to Understand Petri's Role in AI Safety Infrastructure?