Jensen Huang's Bold Vision: Why Every Device Will Soon Run AI Agents

FrontierNews.ai AI Research Desk

Jensen Huang's Bold Vision: Why Every Device Will Soon Run AI Agents

Jensen Huang believes the future of computing revolves around a single repeatable pattern: AI agents that reason, remember, and use tools the same way whether they sit in a data center or on your laptop. At Computex 2026, Nvidia's CEO outlined how this "agent architecture" will transform every edge device, from self-driving cars to humanoid robots, fundamentally reshaping how we interact with technology.

What Is This New Computing Pattern Huang Keeps Describing?

Huang described a unified blueprint that begins with training and inference in the cloud, then pushes outward to everything else. "Every edge device will become autonomous. Every edge device will have agentic systems," he explained to reporters at a press gaggle. The pattern treats a self-driving car, a smartphone, a robot, and a satellite imaging system as variations on the same underlying architecture, just running on different hardware.
Huang

To illustrate the concept, Huang highlighted Nvidia's Alpamayo driving stack, which he described as a system that reasons in language rather than reacting to images. "That's how autonomous vehicles are going to work in the future," he said. "It's essentially that agentic computing pattern with a physical AI model". The system could read a "skill file" and watch a tutorial video to operate unfamiliar machinery the way a person would.
Huang

How Is Nvidia Redesigning Its Hardware to Support This Vision?

Nvidia is building two new processors to anchor this agent-first future. On the data center side, the company has moved Vera, an 88-core Arm processor, into full production. Vera is designed specifically for agents rather than human users. "We built Vera for agents to use," Huang said. "Until six months ago, there were no agents, so that's the definition of a $0 billion market".

The reasoning behind Vera's design reflects a fundamental shift in how Nvidia thinks about computing. Traditional hyperscale CPUs pile on cores because humans lease them by the hundred. Agents, by contrast, don't want to rent CPU cores; they want to generate tokens quickly. This pushed Nvidia toward single-thread speed and memory bandwidth over raw core count. Huang claimed Vera offers the largest step up in single-threaded performance he has seen "in 25 years".

Vera's performance gains are substantial. Nvidia claims 1.8 times faster task completion than x86 processors and a 1.5 times instructions-per-clock gain over its Grace predecessor. A 256-chip liquid-cooled Vera rack reaches six times the throughput of a conventional CPU rack, according to the company. Early customers include Anthropic, OpenAI, xAI, ByteDance, CoreWeave, and Oracle. Nvidia's CFO Colette Kress told investors the company sees "nearly $20 billion in total CPU revenue this year".
Colette Kress

For consumer devices, Nvidia introduced RTX Spark, which Huang calls "the first real rethink of the PC in four decades." The top RTX Spark chip pairs a 20-core Arm CPU built by MediaTek with a Blackwell GPU carrying 6,144 CUDA cores, up to 128GB of LPDDR5X unified memory, and a 600 GB/s NVLink-C2C link, all on TSMC's 3-nanometer node. Fall 2026 laptops are confirmed from Microsoft, Dell, HP, ASUS, Lenovo, and MSI, with Acer and Gigabyte to follow.

Why Does Latency Matter So Much to Agents?

Huang repeatedly emphasized that agents operate at a fundamentally different timescale than humans. "Humans are more patient than agents. Agents, they're working at nanosecond scale, not second scale," he explained. This impatience drives design decisions across both Vera and RTX Spark. For RTX Spark, Huang justified the high-performance specs by arguing that software touching the machine, from Adobe to Blender, "cannot be slow" because an agent driving the machine won't tolerate delays.

What Are the Key Hardware and Software Innovations Supporting This Shift?

Nvidia is deploying several technologies to maximize efficiency and performance for agent-driven systems:

NVFP4 Precision Format: Nvidia's 4-bit floating-point format scales between four, eight, 16, and 32 bits and roughly doubles the parameters that fit in a given memory pool, allowing RTX Spark to hold larger models in its 128GB of memory.
Neural Texture Compression: This technique cuts game texture memory by up to eight times in Nvidia's demonstrations, reducing memory pressure on edge devices.
Custom Olympus Cores: Vera uses Nvidia's own custom Olympus design, its first ground-up server core since the Denver and Carmel projects, optimized for single-threaded performance rather than core count.
MediaTek Cortex Cores: RTX Spark uses Arm's off-the-shelf Cortex reference designs licensed through MediaTek, balancing performance with time-to-market.

Huang signed an HBM4E wafer at SK hynix's booth with the words "Please Make More," underscoring Nvidia's ongoing memory supply constraints. "We have enough supply for very robust growth. However, we are supply constrained," Huang acknowledged.

When Will These Devices Actually Reach Consumers?

RTX Spark laptops are shipping in fall 2026, with confirmed models from major manufacturers. Vera is already in full production and shipping to major AI labs and cloud providers. However, Huang declined to commit to bringing Nvidia's custom Olympus cores to Windows PCs anytime soon. "Our preference is to use off-the-shelf cores whenever we can, because Arm also builds good cores," he said. The first PC chip using Nvidia's own cores isn't expected until 2028.

Morgan Stanley estimates Vera at around $5,000 per socket inside a vertically integrated rack, positioning it as a premium offering for hyperscalers and AI research labs rather than a mainstream consumer product.

What Does This Mean for the Future of Computing?

Huang's vision represents a fundamental departure from the PC era, where devices were tools that humans controlled. In the agent era, devices become systems that operate autonomously, making decisions and taking actions with minimal human intervention. This shift affects everything from how chips are designed to how software is optimized. The emphasis on latency, memory bandwidth, and single-threaded performance reflects the reality that agents can't afford to wait for responses the way humans can.

The breadth of Nvidia's pitch, spanning data centers, laptops, cars, robots, and satellites, suggests the company sees this agent pattern as the defining computing paradigm for the next decade. Whether that vision materializes depends on whether AI agents actually become as ubiquitous and autonomous as Huang predicts, and whether developers can build software that takes full advantage of the hardware Nvidia is putting in place.

Your AI & Tech News Engine

Breaking News

How GPT-5.6 Sol Ultra Became a Weapon for Finding Zero-Day Exploits in WordPress

Elon Musk Merges xAI Into SpaceX, Launches Grok 4.5 at Aggressive Prices to Win Market Share

xAI's Power Play: How Elon Musk's AI Company Is Building a Supercomputer Faster Than Rivals Can Plan

Nuclear Microreactors Are Quietly Powering Data Centers Across Latin America and the US

AI Cloud Provider IREN Signs $2.8 Billion in New Contracts, Targets $4 Billion Revenue by Year-End

Prediction Markets Say Anthropic's Next Claude Opus Model Is Coming This Week

Why Chinese AI Labs Are Winning the Open-Weight Race, and What It Means for Your Business

Jensen Huang's Japan Visit Signals a Turning Point for AI Infrastructure in Asia

Jensen Huang's Bold Vision: Why Every Device Will Soon Run AI Agents

What Is This New Computing Pattern Huang Keeps Describing?

How Is Nvidia Redesigning Its Hardware to Support This Vision?

Why Does Latency Matter So Much to Agents?

What Are the Key Hardware and Software Innovations Supporting This Shift?

When Will These Devices Actually Reach Consumers?

What Does This Mean for the Future of Computing?