CoreWeave Brings Nvidia's Most Powerful AI Chip to Production, Boosting Inference Speed for Moonshot AI's Kimi

FrontierNews.ai AI Research Desk

CoreWeave Brings Nvidia's Most Powerful AI Chip to Production, Boosting Inference Speed for Moonshot AI's Kimi

CoreWeave has become the first AI cloud provider to bring Nvidia's newest and most capable AI processor, the Vera Rubin NVL72, into production at full scale. The milestone matters because as AI models grow larger and need to reason continuously in real-world applications, the speed and efficiency of inference (the process of running a trained model to generate outputs) has become the primary bottleneck limiting how fast AI companies can operate and grow.

What Makes Vera Rubin Different From Previous AI Chips?

The Vera Rubin NVL72 represents a significant leap forward in AI infrastructure design. Each rack contains 72 Nvidia Rubin GPUs (graphics processing units, the specialized chips that power AI computations) paired with 36 Nvidia Vera CPUs (central processing units), all connected through Nvidia's latest NVLink 6th-generation fabric, a high-speed interconnect that allows chips to communicate at 260 terabytes per second.

Compared to Nvidia's previous generation Blackwell chips, Vera Rubin delivers dramatic efficiency gains that translate directly to cost savings and speed improvements. The new architecture achieves up to 10 times better inference performance per watt of electricity consumed, requires up to one-fourth fewer GPUs to accomplish the same work, and costs approximately one-tenth as much per million tokens processed. For context, a token is roughly equivalent to a word or small piece of text that AI models process.

How Is CoreWeave Optimizing This Hardware for Real-World Use?

Bringing a cutting-edge processor from the lab to production at scale requires more than just plugging in new hardware. CoreWeave developed several purpose-built innovations to ensure Vera Rubin performs reliably when handling the demanding workloads of enterprise customers:

Software-Defined Liquid Cooling: CoreWeave created a system called Valvey, a programmable valve assembly that transforms cooling from a passive mechanical system into an actively managed, software-controlled surface. Valvey monitors flow rate, temperature, pressure, and leak detection in real time, enabling automated isolation and emergency shutdown of individual racks without disrupting neighboring systems on a shared cooling loop.
Unified Rack Control: A new appliance called Racky aggregates power, cooling, and environmental sensors into a single management interface, allowing each Vera Rubin rack to be managed as a cloud resource rather than a custom one-off build.
Multi-Rail Networking: CoreWeave supports both Nvidia Quantum-X800 InfiniBand and Nvidia Spectrum-X Ethernet with RDMA (Remote Direct Memory Access) over Converged Ethernet, delivering 1.6 terabits per second of backend bandwidth per GPU and scaling to configurations with hundreds of thousands of GPUs.
Enhanced Security and Isolation: CoreWeave is using Nvidia BlueField-4 data processing units to enable faster data access, lower latency, and stronger tenant isolation at scale, allowing multiple customers to safely share the same infrastructure.

These innovations represent what CoreWeave calls "full-stack orchestration," the deep engineering work required to make laboratory performance translate into reliable production performance.

Which Companies Are Already Benefiting From This Deployment?

Jane Street, a major quantitative research firm, is among the early customers leveraging CoreWeave's Vera Rubin infrastructure. The company has previously scaled across Nvidia's Hopper and Blackwell generations and is now expanding to Vera Rubin.

"Our research depends on infrastructure that's both powerful and reliable, and CoreWeave has delivered on this as we've scaled across Nvidia Hopper and Blackwell. Their ability to deliver highly performant clusters with full cluster observability and a support team that engages deeply on hard problems gives us the confidence to partner with them on Vera Rubin. We are excited about the efficiency gains at rack scale translating into faster training runs and shorter iteration cycles for our researchers," said Craig Falls, head of quantitative research at Jane Street.
Craig Falls, Head of Quantitative Research, Jane Street

Beyond research firms, CoreWeave has earned recognition for delivering exceptional performance for AI model providers. The company achieved the top Platinum ranking in both SemiAnalysis ClusterMAX 1.0 and 2.0 benchmarks, the only AI cloud provider to earn this distinction in both versions. Most notably, CoreWeave ranked number one for inference speed and price-performance when running Moonshot AI's Kimi K2.6 model, according to independent benchmarking conducted by Artificial Analysis.

What Role Did Hardware Partners Play in This Achievement?

Bringing Vera Rubin to production required collaboration across the entire infrastructure stack. Dell Technologies provided the architectural foundation through its PowerEdge XE9812 servers, engineered specifically for the density and precision demands of modern AI workloads. The deployment also features Micron 7600 SSDs (solid-state drives), which deliver improved energy efficiency as one of the first liquid-cooled NVMe storage solutions deployed at rack scale.

"Dell Technologies and CoreWeave share a commitment to delivering innovation that performs at the frontier of what AI demands. The PowerEdge XE9812 was engineered for exactly this kind of density and precision. Working with CoreWeave to bring up the first Nvidia Vera Rubin NVL72 rack is a direct validation of what enterprise-grade hardware can do when it's paired with the right operational expertise," stated Michael Dell, chairman and CEO of Dell Technologies.
Michael Dell, Chairman and CEO, Dell Technologies

Why Does This Matter for the Future of AI?

The shift toward agentic AI, where models operate continuously and reason through complex problems over extended sessions, is fundamentally changing what infrastructure needs to deliver. As context windows expand to millions of tokens (allowing models to process vastly more information at once) and models reach trillion-parameter scale (containing trillions of individual numerical values that define the model's behavior), inference efficiency has become the defining constraint on how quickly AI companies can scale.

CoreWeave's successful production deployment of Vera Rubin demonstrates that the infrastructure ecosystem is keeping pace with these demands. The combination of more efficient hardware, purpose-built software innovations, and deep operational expertise creates a foundation for the next generation of AI applications that require both raw power and production-grade reliability.

Your AI & Tech News Engine

Breaking News

NVIDIA's RTX Spark Chip Could Reshape How Your PC Works, Shifting Power Away From Intel

How AI Agents Are Reshaping Enterprise Security: The New Governance Challenge Nobody Expected