Why NVIDIA's New Vera Chip Could Reshape Enterprise AI Beyond Blackwell
NVIDIA and Dell have introduced a new generation of AI infrastructure that could fundamentally shift how enterprises deploy artificial intelligence, moving away from cloud-based models toward on-premises systems that cost significantly less to operate at scale. The centerpiece is the Dell PowerEdge XE9812 built on NVIDIA's Vera Rubin NVL72 architecture, which delivers up to 10 times lower cost-per-token than Blackwell for large-scale agentic AI inference, a metric that measures the expense of processing individual units of language.
The announcement comes as NVIDIA CEO Jensen Huang declared that artificial intelligence demand has reached an inflection point. "We've now arrived at the era of useful AI, which is the reason why demand is going parabolic, utterly parabolic," Huang stated at Dell Technologies World on May 18. This language signals a shift in how the industry views AI maturity, moving from experimental pilots to production workloads that enterprises depend on daily.
Jensen Huang
What's Driving the Move Away From Cloud AI?
Enterprise behavior is changing faster than many expected. According to Dell's internal survey, 67 percent of AI workloads now run outside the cloud, with 88 percent of respondents running at least one AI workload on-premises. This represents a fundamental reversal from the cloud-first mentality that dominated enterprise technology for the past decade. Companies including pharmaceutical giant Eli Lilly, Samsung, and Honeywell are already running AI workloads on Dell AI Factories powered by NVIDIA technology, with 5,000 enterprises in total production deployment.
The reasons are practical. On-premises deployment allows companies to keep proprietary model weights and sensitive business data within their own data centers, reducing exposure to third-party cloud providers. NVIDIA Confidential Computing, a security layer built into the new systems, protects model intellectual property and enterprise data end-to-end, addressing a critical concern for organizations handling regulated information in industries like pharmaceuticals and finance.
How Does Vera Outperform Blackwell for Enterprise Workloads?
The Vera CPU, NVIDIA's processor purpose-built for agentic AI workloads, delivers measurable performance advantages in the specific tasks enterprises care about most. The chip completes agentic workloads 50 percent faster than traditional x86 processors, and when paired with Starburst's data query engine, it delivers 3 times faster query throughput for large-scale SQL analytics. These improvements matter because agentic AI, which cycles between data queries, sandboxed code execution, and model inference, requires rapid context switching that traditional CPUs struggle with.
The new Dell PowerEdge servers built on NVIDIA HGX Rubin NVL8 support up to 144 GPUs per rack with 100 percent direct liquid-cooled compute nodes, delivering 5.5 times the performance of the previous HGX B200 generation. Vera's 1.2 terabytes per second memory bandwidth, introduced across Dell PowerEdge M9822 and R9822 servers, enables the rapid data movement that inference-heavy workloads demand.
Steps to Evaluate On-Premises AI Infrastructure for Your Enterprise
- Assess Your Workload Profile: Determine whether your AI use cases are inference-heavy and long-running, which favors on-premises deployment with Vera Rubin systems, or training-intensive, which may still benefit from cloud elasticity.
- Calculate Total Cost of Ownership: Compare the 10 times lower cost-per-token advantage of Vera Rubin NVL72 against Blackwell with your current cloud spending, factoring in on-premises infrastructure, cooling, and staffing costs over a three to five year period.
- Evaluate Data Security Requirements: If your organization handles regulated data or proprietary models, investigate NVIDIA Confidential Computing capabilities to understand how end-to-end encryption protects model weights and enterprise data within your data center.
- Plan for Token Consumption Growth: Dell projects token consumption will grow 3,400 percent by 2030, so design infrastructure with headroom for scaling agentic AI workloads as they move from pilot to production.
What Does This Mean for AI Infrastructure Spending?
The financial implications are staggering. Worldwide AI infrastructure spending is projected to reach 3 to 4 trillion dollars by 2030, with token consumption growth driving much of that expansion. If the shift from pilot to production continues at its current pace, and if enterprises validate that most AI value accrues on-premises rather than in the cloud, the infrastructure market could be substantially larger than previous estimates.
However, the success of this vision depends on agentic AI proving reliable in multi-step tasks. If enterprises discover that agentic systems are less dependable than current benchmarks suggest, they may throttle deployments, which would slow infrastructure demand growth. Additionally, the 10 times cost advantage assumes workloads are inference-heavy and long-running, a profile that fits some but not all enterprise use cases.
Who's Already Betting on This Infrastructure?
Major enterprises are moving quickly. Honeywell's Chief Technology Officer Suresh Venkatarayalu walked through the company's transition from public cloud to on-premises AI using the Dell AI Factory for industrial AI use cases, digital twins, and automation spanning from data center to the edge. Hudson River Trading, a quantitative trading firm, is expanding its Dell deployment to power AI-driven research with Dell PowerEdge XE9685L servers. Eli Lilly's executive vice president Diogo Rau joined the keynote to discuss how the pharmaceutical company is leveraging AI infrastructure deployed at scale with Dell and NVIDIA for life sciences advancements.
Dell has also announced software partnerships to accelerate adoption. Palantir's sovereign AI operating system reference architecture with NVIDIA now runs on Dell infrastructure for on-premises deployment, while ServiceNow customers will be able to leverage the Dell AI Factory to bring together infrastructure and enterprise workflow automation. OpenAI Codex will connect with the Dell AI Data Platform, allowing customers to bring the code assistant closer to internal context including codebases, documentation, and business systems.
"We've now arrived at the era of useful AI, which is the reason why demand is going parabolic, utterly parabolic," said Jensen Huang.
Jensen Huang, CEO at NVIDIA
The broader question facing the market is whether 67 percent of workloads running outside the cloud represents a genuine tipping point or early-stage experimentation. If that number reflects sustained production workloads, the infrastructure spending projections may prove conservative. If it represents pilot projects that enterprises eventually consolidate back to cloud providers, the 3 to 4 trillion dollar estimate could land later than 2030 or prove overstated. The next 12 to 18 months will clarify whether on-premises AI infrastructure becomes the dominant model or remains a specialized solution for security-sensitive use cases.