NVIDIA's Surprise Move Into CPUs Signals a Seismic Shift in AI Infrastructure

FrontierNews.ai AI Research Desk

NVIDIA's Surprise Move Into CPUs Signals a Seismic Shift in AI Infrastructure

NVIDIA's entry into the CPU market marks a watershed moment for artificial intelligence infrastructure. For two decades, NVIDIA built its empire on graphics processors (GPUs), but the company now recognizes that the next phase of AI requires a fundamental rebalancing of computing power. On June 1st at the GTC Taipei 2026 conference, NVIDIA unveiled Vera, its first standalone CPU line based on ARM architecture, with OpenAI and Anthropic already signed on as early customers.

Why Are CPUs Suddenly Critical to AI?

The shift reflects a dramatic change in how AI systems actually work in the real world. During the training phase, when companies build large language models from scratch, GPUs handle roughly 70 to 85 percent of the computational load. But once those models move into production and begin performing complex reasoning tasks, the balance flips entirely.

In what the industry calls "Agentic AI," where systems break down complex problems into multiple steps, call external tools, search databases, and manage intermediate results, CPUs now handle over 70 percent of the workload, up from just 10 to 30 percent during training. This happens because agent tasks generate enormous amounts of intermediate data, known as KV Cache (Key-Value Cache), that exceeds what GPU memory can hold. A single complex agent task can easily produce more data than NVIDIA's flagship H100 GPU can store in its 80 gigabytes of onboard memory.

"In the era of AI agents, the CPU has become a key bottleneck in data center performance, and the speed of token production in AI factories cannot be slowed down by the CPU," stated Jensen Huang, NVIDIA CEO.
Jensen Huang, CEO at NVIDIA

When this happens, the system offloads that intermediate data to the CPU, which can be equipped with external memory reaching several terabytes, one to two orders of magnitude larger than GPU memory. This means CPUs are no longer just handling scheduling and data loading; they are now managing massive memory pools and performing critical reasoning operations.

How Big Is This Market Opportunity?

The financial implications are staggering. UBS predicted in a recent semiconductor industry report that the total addressable market for server CPUs will grow from approximately 30 billion dollars in 2025 to about 170 billion dollars by 2030, a nearly fivefold increase in five years. AMD CEO Lisa Su doubled her company's market size forecast for server CPUs from 60 billion dollars to over 120 billion dollars, raising the expected compound annual growth rate from 18 percent to 35 percent.

This surge is already visible in pricing. Intel and AMD have implemented rare industry-wide price increases of 10 to 15 percent for server CPUs, breaking a decade-long trend of declining prices and more performance for less money. AMD is capturing premium pricing power through high-core-count products; in the first quarter of 2026, AMD held only 33.2 percent of unit shipments but generated 46.2 percent of revenue, while Intel held 66.8 percent of units but only 53.8 percent of revenue.

What Does This Mean for Data Centers?

The classic ratio of CPUs to GPUs in AI servers is rapidly changing. Traditional AI training setups used roughly one CPU for every eight GPUs. For agent deployments, that ratio is converging toward one-to-one, meaning data centers will need to double their CPU investments. Intel management stated during their first quarter 2026 earnings call that the number of CPU cores required per gigawatt of power may increase from the current approximately 30 million to 120 million in the era of AI agents.

The workload intensity is also exploding. IDC predicts that the global number of agent-executed tasks per year will grow from about 44 billion in 2025 to over 400 trillion by 2030. Token consumption for AI deployment in agent mode is typically 20 to 30 times that of ordinary conversations, as a single user interaction often involves dozens of tool calls and intermediate reasoning steps.

How Are CPU Manufacturers Responding?

Both Intel and AMD have undergone intensive technological upgrades to meet this demand. Core counts in server CPUs have risen from 28 cores in 2017 to 288 cores in Intel's Clearwater Forest and 256 cores in AMD's Venice in 2026, representing a nearly tenfold increase in density. These advances, combined with new interconnection standards like CXL 4.0 (Compute Express Link), allow multiple CPUs to share large-capacity memory pools, reducing the overhead of moving data between chips.

Research from Intel and Georgia Tech published in November 2025 titled "A CPU-Centric Perspective on Agentic AI" tested five typical agent workloads and found that CPU-side tool processing accounted for 43.8 to 90.6 percent of total latency. This data validates what industry insiders have been observing: GPU utilization in agent tasks is generally below 50 percent, far lower than the 70 to 85 percent seen in traditional inference services.

Steps to Understanding the CPU-GPU Rebalancing in Your Organization

Assess Your Workload Type: Determine whether your AI applications focus on model training, inference, or agent-based reasoning. Training-heavy workloads still benefit from GPU-dominant architectures, while agent deployments require substantial CPU investment.
Evaluate Memory Requirements: Calculate the intermediate data your systems generate during complex reasoning tasks. If KV Cache exceeds your GPU memory capacity, plan for CPU-based memory expansion and CXL interconnection standards.
Plan Infrastructure Upgrades: Budget for increased CPU core counts and higher-capacity memory pools. The CPU-to-GPU ratio in your data centers may need to shift from 1:8 toward 1:1 for agent deployments.
Monitor Vendor Roadmaps: Track releases from NVIDIA (Vera), AMD, and Intel as they optimize for agent workloads. Pricing power is shifting toward high-core-count CPUs, so early adoption decisions will affect long-term costs.

NVIDIA's move into CPUs is not a defensive reaction to competition; it is a recognition that the company's future depends on providing complete systems for the next generation of AI applications. By building its own CPU line, NVIDIA can optimize the entire stack for agent workloads, just as it has done with GPUs and software for training. The 170 billion dollar CPU market opportunity by 2030 represents one of the largest infrastructure shifts in computing history, and NVIDIA is positioning itself to capture a significant share.

For enterprises planning AI infrastructure investments, the message is clear: the era of GPU-centric data centers is giving way to balanced CPU-GPU architectures optimized for reasoning and tool use. Companies that understand this transition early will avoid costly infrastructure overruns and position themselves to deploy the next generation of AI agents efficiently.

Your AI & Tech News Engine

Breaking News