Logo
FrontierNews.ai

Harvard Professor Says AI Alignment Research Could Make Things Worse, Not Better

A prominent AI safety researcher is challenging the conventional wisdom that alignment research is the best path to safer AI systems. Stephen Casper, an incoming tenure-track professor of public policy at Harvard Kennedy School, argues that excessive focus on making AI systems aligned with their creators' goals could actually prove counterproductive, and that governance and regulation offer a more promising approach to managing AI risks.

Why Would an AI Safety Expert Oppose Alignment Research?

Casper's position represents a significant departure from mainstream AI safety thinking. While he supports slowing down AI development overall, he takes the unusual stance that the field should reduce investment in alignment and super-alignment research. His concern is not about artificial superintelligence turning against humanity in a dramatic takeover scenario, but rather about a more gradual erosion of human agency and power.

Casper completed his PhD in computer science from MIT, where his research focused on model tampering attacks, a technique for evaluating and defending AI systems by testing how they behave when their internal computations are manipulated. His academic background spans AI safety, interpretability, and security, with 61 papers published on Google Scholar and over 5,000 citations to his work.

"I would like to halt the research enterprise around making super intelligent systems intelligently aligned with their creators' goals," Casper stated.

Stephen Casper, Incoming Tenure-Track Professor of Public Policy at Harvard Kennedy School

His reasoning centers on what he calls the "mainline doom scenario," which differs sharply from extinction-focused narratives. Rather than worrying about rogue superintelligence, Casper is concerned about a handful of companies wielding extremely powerful AI systems while the general public becomes increasingly disempowered and dependent on those systems.

What Is Casper's Alternative Vision for AI Safety?

Instead of alignment research, Casper advocates for what he describes as an "AI governance hawk" approach. This includes taxes and regulation designed to keep frontier AI labs in check and prevent excessive concentration of power. He argues that governance failures, rather than technical misalignment, represent the primary threat to human flourishing in an AI-driven world.

Casper's research trajectory illustrates his evolving perspective. He entered the AI safety field in 2018 after reading Nick Bostrom's "Superintelligence," initially adopting mainstream rationalist and effective altruism frameworks focused on existential risk. However, by 2022 and 2023, he began thinking more critically about whether the AI field was actually bottlenecked by governance capacity rather than technical risk management tools.

How Does Casper's Technical Work Support His Governance Focus?

Casper's technical agenda centers on making AI safeguards robust enough to work effectively with open-weight models, even when adversarial users attempt to fine-tune them on harmful data. His work on model tampering attacks provides tools for evaluating how well AI systems resist manipulation of their internal states and weights.

Key aspects of his technical and policy approach include:

  • Model Tampering Defenses: Training and testing AI models under adversarial conditions where their internal computations are manipulated to assess and strengthen their resistance to harmful behavior.
  • Open Model Robustness: Developing safeguards that work reliably for openly available models, not just proprietary systems controlled by major companies.
  • Governance-First Strategy: Prioritizing regulatory frameworks and institutional oversight over technical alignment solutions as the primary lever for managing AI risks.
  • Power Concentration Prevention: Advocating for policies that prevent a small number of companies from wielding disproportionate influence over AI systems that affect billions of people.

Casper has contributed to this work through roles at the UK AI Safety Institute and the Center for Human-Compatible AI at Berkeley. His research has earned recognition including the ML Safety Workshop Best Paper Award and outstanding paper finalist distinctions in TMLR, a peer-reviewed machine learning journal.

What Does Casper Mean by the "Idiocracy-Inspired" Scenario?

Casper's mainline doom scenario draws inspiration from the film "Idiocracy," envisioning a gradual disempowerment of the general population rather than a sudden catastrophic event. He suggests that poor governance of AI systems could lead to widespread sycophancy, where AI systems tell people what they want to hear rather than providing accurate information, ultimately leaving large segments of the population behind.

His analysis points to real-world examples of governance failures in AI development. He cites the case of DALL-E 2, OpenAI's image generation system, versus Stable Diffusion, noting how different governance approaches led to different outcomes in terms of system capabilities and societal impact.

Casper estimates that under current trajectories, approximately 84% of the population could be left behind by AI advances, unable to effectively compete with or understand the systems reshaping their world. This concern about widespread disempowerment, rather than extinction, drives his advocacy for stronger governance frameworks.

Steps to Implement Stronger AI Governance

While Casper's specific policy proposals are still developing, his framework suggests several governance-focused approaches:

  • Regulatory Taxation: Implementing taxes on frontier AI development to slow deployment and fund oversight mechanisms.
  • Public Sector Capacity: Building government expertise and institutional capacity to understand and regulate AI systems effectively.
  • Open Model Safety: Ensuring that openly available AI models have robust safeguards to prevent misuse, rather than relying solely on proprietary systems.
  • Transparency Requirements: Mandating disclosure of how AI systems are trained, deployed, and governed to enable public accountability.

Casper's position challenges the AI safety community to reconsider its priorities. While alignment research remains important to many researchers, his argument suggests that without strong governance structures, even well-aligned AI systems could concentrate power in ways that harm human autonomy and opportunity. His tenure-track position at Harvard Kennedy School signals growing institutional recognition that AI governance deserves serious academic attention alongside technical safety research.