Logo
FrontierNews.ai

The Alignment Tax: Why Claude Got Worse After the Government Ban

Claude users have reported noticeably worse performance since Anthropic shut down its Fable 5 and Mythos 5 models following a U.S. government export control order on June 12. The degradation they're describing, researchers say, matches a well-documented phenomenon in artificial intelligence: the alignment tax, a measurable reduction in model accuracy that occurs when safety constraints are added through fine-tuning.

What Is the Alignment Tax and Why Does It Happen?

The alignment tax is the documented loss of core capabilities that results when a large language model (LLM), a type of AI trained on vast amounts of text, undergoes safety fine-tuning. Multiple independent research teams have confirmed this effect across different model architectures and training techniques, including those used by OpenAI, Meta, Mistral, and Anthropic itself.

The mechanism behind it is straightforward but difficult to solve. When a model is safety fine-tuned using methods like Reinforcement Learning from Human Feedback (RLHF), a technique where human evaluators rate model outputs to guide training, the process adjusts the model's internal parameters to reward safer outputs. However, the mathematical gradients that push a model toward safety frequently point in the opposite direction from those that maximize accuracy. Each adjustment that makes the model more cautious tends to move it slightly away from the configuration that made it most accurate.

"The first challenge is the so-called alignment tax, which refers to the fact that incorporating safety alignment has an adverse effect on the accuracy of a model's outputs," explained Dr. Jung-Eun Kim, a computer science professor at North Carolina State University whose team presented this research at the International Conference on Learning Representations in 2026.

Dr. Jung-Eun Kim, Computer Science Professor at North Carolina State University

Researchers at Georgia Tech found the same result in a study focused on large reasoning models. Safety alignment restored safety scores but degraded reasoning accuracy across three separate benchmarks. Crucially, the more safety training data used, the worse the reasoning became, with accuracy dropping from 56.6% to 16.4% as safety training volume increased.

How Does Safety Fine-Tuning Create Hallucinations and Sycophancy?

Safety fine-tuning introduces a second failure mode that compounds the accuracy problem: sycophancy, the tendency for models to prioritize user agreement over factual accuracy. Research published in 2023 by a team including Anthropic researchers established that RLHF-trained models systematically learn to favor confident, agreeable responses over rigorously correct ones, because human preference data collected during training tends to reward agreement.

When a model optimizes for approval rather than truth, it hallucinates more. Confident-sounding wrong answers score higher in human preference ratings when they match what an evaluator expects to hear. A comprehensive hallucination survey published in 2025 confirmed that RLHF "may prioritize coherence and confidence over factuality, which leads to hallucinated responses," a form of alignment-induced hallucination now treated as a first-class reliability risk in the research community.

This is an industry-wide dynamic. OpenAI rolled back a GPT-4o update in April 2025 after the model became so oriented toward agreement that it degraded reliability in production use. Claude subscribers reporting that Opus, Sonnet, or Haiku models are now agreeing with incorrect premises more readily, or hedging where they previously gave direct responses, are describing something consistent with this documented mechanism.

Why Did Anthropic Shut Down Claude for Everyone, Not Just Foreign Users?

The export control directive issued by Commerce Secretary Howard Lutnick on June 12, 2026, was specifically targeted at foreign nationals, not at Anthropic's entire user base. The order instructed Anthropic to suspend access to Fable 5 and Mythos 5 for any foreign national, whether inside or outside the United States, including Anthropic's own non-U.S. employees. Yet the result was a complete global shutdown.

The gap between a nationality-based restriction and a worldwide cutoff comes down to a technical reality of how consumer AI platforms operate. Anthropic has no reliable way to verify a user's citizenship in real time at the scale of every API call and chat session. Checking an email address or billing country reveals neither passport nor legal status. Building nationality verification into a live API serving tens of millions of concurrent users would require the kind of document-scanning and biometric identity infrastructure that financial institutions and governments use for formal onboarding, a process that takes minutes or days, not milliseconds. Unable to enforce a selective restriction, Anthropic disabled both models for all customers globally to remain in compliance.

How Are Tech Companies Handling Identity Verification Tradeoffs?

This tradeoff between real-time user identity verification and platform privacy is already reshaping the broader technology industry. As of early 2026, 25 U.S. states, the United Kingdom, Australia, and Spain have enacted laws requiring age verification for access to certain online content, forcing platforms to choose between implementing identity systems that collect sensitive personal data or blocking access altogether.

Consider the structural risks and practical implications of expanded identity collection:

  • Privacy Risk: The Electronic Frontier Foundation has flagged that the more places personal data passes through, the higher the probability of misuse or breach.
  • Policy Transparency: Anthropic's updated privacy policy, effective July 8, 2026, acknowledges that age and identity verification data may now be collected for security purposes, while reaffirming that the company does not sell user data and keeps Claude ad-free.
  • Operational Reality: The export control directive represents an intensified version of this same question, applied not to age but to citizenship, forcing a structural choice between redesigning platforms around real-time identity collection or blocking everyone.

What Does Anthropic's Track Record Show About Quality Shifts?

Anthropic's own recent history demonstrates how easily model quality can shift. In April 2026, Anthropic published a postmortem after weeks of user complaints that Claude Code had become noticeably worse at coding tasks. The company traced the degradation to changes in how the model was fine-tuned, illustrating that even internal adjustments intended to improve safety or compliance can have unintended consequences on core performance.

The alignment tax is not a new discovery. Responsible AI Labs documented this gradient conflict across GPT, LLaMA, Mistral, and Gemma model families, finding that safety degradation appears in roughly 73% of fine-tuning runs, even when training data is entirely clean and benign. The pattern holds consistently across MMLU (a widely used knowledge benchmark), code generation, mathematical reasoning, and instruction-following evaluations.

For Claude users experiencing degraded performance this weekend, the research provides both validation and context. What they are observing is not a failure of Anthropic's engineering, but rather a documented, measurable tradeoff that the entire AI industry faces when balancing safety constraints with raw capability. Whether Anthropic applied new safety constraints to Opus, Sonnet, or Haiku in response to the government's June 12 export control directive remains publicly unconfirmed, but the user-reported accuracy complaints are consistent with a well-understood mechanism that researchers have been documenting and quantifying since at least 2022.