Why the U.S. Government Is Ditching GPUs for a Completely Different Chip Architecture

FrontierNews.ai AI Research Desk

Why the U.S. Government Is Ditching GPUs for a Completely Different Chip Architecture

The U.S. Department of Energy is moving away from graphics processing units (GPUs) for its most demanding supercomputers, embracing a radically different chip architecture designed specifically for scientific computing. Sandia National Laboratory recently approved NextSilicon's Maverick-2 dataflow processor, marking the first major validation of this alternative approach for high-performance computing (HPC). While nine of the world's top 10 supercomputers currently run on GPUs, this shift suggests that era may be ending.

What's Wrong With GPUs for Scientific Computing?

The problem isn't that GPUs are bad at computing; it's that they've been optimized for the wrong kind of math. Nvidia's latest Rubin GPUs, arriving later this year, promise massive performance for artificial intelligence (AI) workloads by delivering up to 50 petaFLOPS of FP4 compute. FP4 refers to a low-precision number format that's perfect for AI training and inference, where approximate answers are acceptable. But scientific computing demands something entirely different.

Scientists running simulations for nuclear weapons physics, bioweapons defense, and public health need 64-bit floating point (FP64) precision, a much more demanding calculation that requires exact answers. Nvidia's Rubin tops out at just 33 teraFLOPS of native FP64 compute, making it slower than the company's H100 chip from nearly four years ago. To compensate, Nvidia is using a workaround called the Ozaki scheme, which uses lower-precision math to emulate FP64 results. The problem: this approach works well for some workloads but fails for others, particularly computational fluid dynamics simulations that are heavy on vector operations.

How Does NextSilicon's Dataflow Approach Work Differently?

Instead of the traditional von Neumann architecture that powers most CPUs and GPUs, NextSilicon's Maverick-2 uses a reconfigurable dataflow architecture. Think of it as a grid of specialized math units connected like a pipeline. Each unit performs a specific operation, whether addition, multiplication, or logic. The real innovation is that data flows through this grid continuously; as soon as data reaches the next unit, it's computed immediately without waiting for traditional load-store operations that shuffle data around in memory.

This approach isn't entirely new. Companies like Groq, Cerebras, and SambaNova have built dataflow chips, but they've focused on AI inference and training. NextSilicon is one of the few applying this architecture to scientific computing. The challenge with dataflow chips has always been programming difficulty, but NextSilicon solved this by building a compiler that translates existing C, Python, Fortran, and CUDA code to run on its hardware. The compiler captures the compute graph from CPU execution, maps it to the chip, and optimizes it for maximum performance.

Sandia validated the Maverick-2 across three critical scientific workloads: the high-performance conjugate gradient (HPCG) benchmark, the LAMMPS molecular dynamics test suite, and the Sparta Monte Carlo simulation suite. According to NextSilicon, a single Maverick-2 can deliver about 600 gigaFLOPS of FP64 compute on HPCG benchmarks, roughly matching leading GPUs while consuming half the power.

Steps to Understanding the Shift Away From GPU-Centric Supercomputing

The AI-HPC Divergence: Nvidia and AMD have optimized their latest accelerators for AI workloads, which require lower precision and massive throughput. This leaves scientific computing, which demands exact 64-bit calculations, without ideal hardware solutions.
Dataflow Architecture Benefits: NextSilicon's approach eliminates memory bottlenecks by computing data immediately as it flows through the pipeline, improving both performance and energy efficiency compared to traditional GPU designs.
Compiler-Based Compatibility: Rather than forcing scientists to rewrite decades of code, NextSilicon built tools that automatically translate existing scientific software to run on its chips, removing a major barrier to adoption.
Government Validation: Sandia's approval of the Maverick-2 for production use signals that the U.S. Department of Energy is serious about moving beyond GPUs for its most critical simulations.

Why Is AMD Taking a Different Path?

AMD has taken a more pragmatic approach than Nvidia. While its MI455X accelerators are tuned for AI inference and training, the company also developed the MI430X specifically for HPC workloads. The MI430X delivers up to 200 teraFLOPS of peak FP64 performance, making it suitable for the Department of Energy's upcoming Discovery supercomputer and Europe's Alice Recoque system. This dual-track strategy acknowledges that AI and scientific computing have fundamentally different hardware needs.

What Does This Mean for the Future of U.S. Supercomputing?

The Spectra supercomputer at Sandia is small by modern standards, with just 64 nodes and 128 Maverick-2 accelerators. But it's a proof of concept. If NextSilicon can scale its technology to larger systems, it could reshape how the Department of Energy builds its most powerful machines. The stakes are high; these supercomputers simulate nuclear weapons physics, model disease spread, and run other critical national security simulations.

Interestingly, China has already demonstrated that boutique silicon can compete with Western supercomputers. The country has built custom processors specifically for scientific computing, including the Sunway TaihuLight with 260 custom RISC processors and the Tianhe 2A with a homegrown digital signal processor called the Matrix 2000. More recently, China reportedly developed the LineShine supercomputer using 47,000 custom CPUs designed to deliver 2 exaFLOPS of FP64 performance. Because China no longer participates in the annual Top500 ranking of fastest supercomputers, the world may never know exactly how these systems perform.

The broader lesson is clear: as AI has made Nvidia the world's most valuable chipmaker, the company's focus has shifted almost entirely to artificial intelligence. That leaves a gap in the market for scientific computing, and startups like NextSilicon are rushing to fill it. For the U.S. government, the message is equally clear: relying on GPUs designed for AI means accepting compromises on the precision and efficiency needed for scientific simulation. NextSilicon's approval by Sandia suggests the Department of Energy is ready to explore alternatives, even if those alternatives come from startups rather than established giants.

Your AI & Tech News Engine

Breaking News

Alibaba Bans Claude Code Over Hidden Tracking Feature, Escalating U.S.-China AI Tensions

Why America's AI Labs Are Losing the Open-Source Race to China

AI Search Is Erasing the Signals That Made Quality Content Discoverable

Meta's Watermelon AI Claims to Match GPT-5.5, But There's a Catch

Tesla's Miami Robotaxi Launch Faces Its Toughest Test Yet: Tropical Rain

Beyond Tesla: How Hardware Suppliers Are Quietly Winning the Autonomous Vehicle Race

Nvidia Joins the Space Race: Why Tech Giants Are Building AI Data Centers in Orbit

Robotaxis Face an Unexpected Challenge: Unruly Passengers

Why the U.S. Government Is Ditching GPUs for a Completely Different Chip Architecture

What's Wrong With GPUs for Scientific Computing?

How Does NextSilicon's Dataflow Approach Work Differently?

Steps to Understanding the Shift Away From GPU-Centric Supercomputing

Why Is AMD Taking a Different Path?

What Does This Mean for the Future of U.S. Supercomputing?