Why AI's Top Researchers Are Abandoning Safety for Pre-Training
When two of AI's most respected researchers both bet on pre-training over safety work, it signals where the field thinks the next breakthrough lives. Andrej Karpathy, a co-founder of OpenAI and former Tesla AI leader, has joined Anthropic's pre-training team alongside Ilya Sutskever, marking one of 2026's most significant AI talent moves. The decision reveals a pragmatic pivot at Anthropic: you cannot align a foundation model that isn't capable enough in the first place.
What Does This Talent Move Actually Signal?
For the past two years, the AI field has been obsessed with post-training gains. Reinforcement Learning from Human Feedback (RLHF), a technique that uses human feedback to fine-tune models after initial training, made models more helpful. Constitutional AI, Anthropic's safety framework that guides model behavior using written principles, made them safer. Synthetic data fine-tuning made them more capable without requiring new training runs. All of these are refinements to existing foundations.
Karpathy's move signals something different: the next order-of-magnitude improvement won't come from better refinement. It will come from better foundations. Pre-training is the foundational layer where models learn language, reasoning, and world knowledge from raw data. By hiring Karpathy and Sutskever, both pre-training experts, Anthropic is signalling that this foundational layer is where the next breakthrough lives.
This is particularly striking because Anthropic was built on a safety-first positioning. The company raised billions on the promise that they would build AI that doesn't cause harm. But hiring Karpathy for pre-training says something different: safety without capability is irrelevant. Constitutional AI only works if the base model is capable enough to understand constitutions in the first place.
Why Is the Pre-Training Talent Pool So Competitive?
The AI talent war has three distinct fronts, and pre-training expertise sits at the most critical one:
- Pre-training expertise: The people who know how to train models at scale from raw data, the most foundational and difficult work
- Post-training expertise: The people who make models useful and safe through refinement techniques like RLHF and Constitutional AI
- Product expertise: The people who turn models into revenue-generating products and services
Karpathy is front-line pre-training talent. So is Sutskever. By hiring both, Anthropic is deepening its bench in the scarcest skill in AI. OpenAI lost two of its founding pre-training minds. Google DeepMind has Demis Hassabis, but he is increasingly a public face. Meta has Yann LeCun, but he is focused on open science, not product work. xAI has Elon Musk, but he is not a researcher. The pre-training talent pool is shallow, and Anthropic just made a major acquisition.
How to Understand the Strategic Implications
- For Anthropic: This is a capability bet. If Karpathy and Sutskever deliver a pre-training breakthrough, Anthropic's models could leapfrog OpenAI's next generation, making the safety positioning secondary to raw capability
- For OpenAI: This is a talent loss that signals their pre-training moat may not be as deep as they thought. If Karpathy believed OpenAI had the best pre-training setup, he would have stayed
- For the field: The centre of gravity is shifting from "how do we make AI safe?" to "how do we make AI capable enough that safety matters?" This is a subtle but important reframing of what alignment research should prioritize
Karpathy's trajectory illustrates this shift. He co-founded OpenAI in 2015 with Sutskever and others, left for Tesla in 2017 to run Autopilot AI, left Tesla in 2022 for independent research, and now joins Anthropic in 2026 as Sutskever reunites with him. This isn't just a talent acquisition. It's a statement about where Anthropic thinks the next breakthrough lives.
The practical implication is clear: safety and capability are entangled at the pre-training layer. You cannot align a model that lacks the foundational reasoning ability to understand what alignment means. The next safety breakthrough might come from a pre-training breakthrough, not from better RLHF or Constitutional AI techniques applied to an already-trained model.
For researchers and students considering AI careers, pre-training experience on a resume just went up in value. The companies that figure out pre-training breakthroughs will define the next decade. The ones that don't will be refining someone else's foundations.