Claude's Hidden 'Undercover Mode' Raises Questions About AI Transparency in Open Source
Anthropic's Claude Code includes a feature called 'Undercover Mode' that automatically removes evidence of AI assistance from public code contributions made by Anthropic employees, according to source code leaked in March 2026. The system strips internal model codenames, unreleased version numbers, and Co-Authored-By lines from commit messages and pull requests when employees contribute to open-source repositories. While designed to prevent accidental leaks of proprietary information, the feature has sparked debate about whether it conflicts with emerging transparency norms in the open-source community .
What Exactly Is Undercover Mode and How Does It Work?
Undercover Mode activates automatically when Claude Code detects that an Anthropic employee (identified by USER_TYPE === 'ant') is working in a public or open-source repository. Once triggered, the system injects specific instructions into Claude's operating parameters that prevent certain information from appearing in public commits and pull requests .
The feature removes several categories of internal information from public contributions:
- Internal Model Codenames: Animal names like Capybara and Tengu that Anthropic uses internally for unreleased models
- Version Numbers: Unreleased model version identifiers that could expose product roadmap information
- Internal Communication Channels: References to Slack channels and internal short links like go/cc that reveal internal infrastructure
- AI Attribution Signals: Co-Authored-By lines and any mention that an AI contributed to the code
- Project Identifiers: Internal repository or project names and the phrase "Claude Code" itself
A critical technical detail: once Undercover Mode is triggered in an internal build, there is no force-off switch. The mode operates as a one-way door, according to analysis from the GitHub breakdown of the leaked source. In external builds distributed to regular users, the entire function is eliminated during compilation, meaning Undercover Mode affects only Anthropic employees .
Why Did Anthropic Build This Feature in the First Place?
The original intent behind Undercover Mode is straightforward and arguably reasonable. Anthropic engineers increasingly use Claude Code when contributing to open-source projects, which is a legitimate and growing workflow across the AI industry. The problem the feature addresses is real: an AI system with access to internal context can accidentally leak sensitive information through commit messages or pull request descriptions .
A commit message mentioning an unreleased model version number or a pull request referencing an internal Slack channel could expose roadmap information that nobody intended to make public. In its narrowest reading, Undercover Mode functions as a data-hygiene tool designed to prevent accidental internal information exposure. Stripping internal codenames from public commit messages and preventing references to internal communication channels from appearing in public pull requests are both reasonable precautions .
However, the language used in the system prompt complicates this straightforward interpretation. The prompt opens with "Do not blow your cover," which frames the situation not simply as "avoid leaking internal data" but rather as "maintain a cover story." Those are meaningfully different orientations, and the distinction matters to how the developer community has reacted to the disclosure .
How Has the Developer Community Reacted to This Disclosure?
The response from developers and open-source advocates has split into three distinct camps, each raising different concerns about the implications of Undercover Mode .
The first group focused on the irony of the situation: Anthropic built an entire subsystem specifically designed to prevent internal information leaks, and then that subsystem was itself exposed through the largest leak in the company's history. The source code containing Undercover Mode was shipped in a .map file that anyone could download. Anthropic acknowledged the incident as "a release packaging issue caused by human error, not a security breach," according to coverage of the company's response .
Anthropic
The second camp zeroed in on the suppression of Co-Authored-By attribution signals. Most major AI coding tools, including GitHub Copilot, leave attribution signals in commit metadata when they assist with code. Actively stripping those signals specifically from public open-source repositories puts Claude Code in a different category. This matters because open-source contribution norms depend on knowing who or what contributed what. The Developer Certificate of Origin (DCO), which many open-source projects use as a lightweight attestation framework, requires contributors to certify they have the right to submit their work. An AI contributor instructed to remove all evidence of being an AI creates tension with that framework .
The third group argued that every company has internal tooling with unusual properties, and Claude Code's just happens to be unusually visible now. They emphasized an important clarification: there is no evidence that Undercover Mode affects regular Claude Code users outside Anthropic. The trigger condition is specific to Anthropic employees in public repositories only .
What Do Open-Source Communities Actually Expect Regarding AI Disclosure?
The Undercover Mode disclosure arrived in the middle of an already-heated debate about AI transparency in open-source development. Red Hat published a thorough analysis in late 2025 arguing that transparent disclosure of AI assistance is increasingly treated as a cultural norm in open-source communities, even when it is not yet legally mandated .
Different projects have adopted different approaches. Some, like QEMU, have adopted explicit policies banning AI-generated contributions outright, largely because of uncertainty about Developer Certificate of Origin compliance. Fedora has gone in the opposite direction, requiring disclosure through "Assisted-by:" tags but not prohibiting AI involvement. The Linux Foundation's position, which underpins the DCO that governs contribution standards for thousands of major projects, is that the framework was designed around human authorship and has not fully caught up with AI-assisted workflows .
What Undercover Mode does is opt out of that emerging norm at exactly the point where it might create the most friction: contributions from an AI company to public open-source projects. This is not the same as an individual developer quietly using Copilot for boilerplate code. The asymmetry of information is meaningfully different when the party making contributions has built the AI tool, employs the engineers using it, and has designed the system to suppress disclosure .
How to Evaluate AI Coding Tools for Your Open-Source Team
If your organization contributes to open-source projects and is evaluating AI coding assistants, several key questions are worth asking about any tool you consider adopting:
- Attribution Transparency: Does the tool attribute AI assistance in commit metadata by default, or does it require manual configuration to enable attribution signals?
- Disclosure Requirements: Does the tool support or require disclosure tags like "Assisted-by:" or "Co-Authored-By" when AI contributes to code, and can these be disabled?
- Internal vs. External Behavior: Does the tool behave differently when used by employees of the company that built it versus external users, and if so, how?
- Compliance Alignment: Does the tool's default behavior align with the Developer Certificate of Origin and other open-source contribution frameworks your projects use?
- Configuration Control: Can you force attribution on or off, or does the tool make these decisions automatically based on context?
These questions matter because they determine whether using an AI coding assistant creates compliance risks or transparency gaps for your open-source contributions .
What Other Features Were Revealed in the Same Leak?
Undercover Mode was not the only significant finding in the leaked Claude Code source. The same disclosure contained 108 feature-gated modules that were stripped from external builds through compile-time dead code elimination. These included KAIROS, described as a persistent autonomous background agent that watches a user's working environment and acts without prompts; ULTRAPLAN; VOICE_MODE; and a virtual pet system called BUDDY with 18 species, deterministic per-user seeding, and a 1% shiny variant chance .
However, Undercover Mode earned its own conversation because it is not a future feature under development. It is current, active behavior in a production tool used by the people who build Claude. The central irony remains: Undercover Mode was built to prevent leaks, yet the .map file that exposed it was likely shipped as a result of human error in the build pipeline. Anthropic has since pulled the affected package version and committed to process changes .
The disclosure highlights a broader tension in AI development: the gap between what companies intend with their internal tools and how those tools appear to external communities. Undercover Mode was probably not a cynical decision, but rather a practical one that may not have been carefully examined through a transparency lens. That gap between intent and optics is exactly what the developer community continues to react to.