The Memory Crunch: Why AI's Real Bottleneck Isn't Computing Power Anymore
Artificial intelligence systems are running into a new constraint that's reshaping how companies build data centers: they're running out of memory, not computing power. For years, the limiting factor in AI training was raw processing speed. Now, as models grow larger and inference workloads scale with user demand, high-bandwidth memory (HBM) has become the genuine bottleneck, and the supply chain for these specialized chips is tighter than ever.
What Changed in AI Infrastructure Demands?
Every new generation of AI hardware carries dramatically more HBM, a specialized type of ultra-fast memory that allows graphics processing units (GPUs) and other accelerators to exchange data at extraordinary speeds. The shift is fundamental: inference, the process of running trained models to generate responses, now scales almost linearly with the number of concurrent users. That means memory becomes equally important as compute power itself.
The technical reason is straightforward. AI training at massive scale requires thousands of GPUs to continuously exchange data, with each chip sending computation results to every other chip and then updating the model together. Any delay in memory access creates idle time, wasting expensive compute resources. This is why Microsoft's new Fairwater campus in Wisconsin uses a two-story building design with vertical GPU racks connected by 800-gigabit-per-second Ethernet, shortening the physical distance data must travel and reducing latency.
Why Only Three Companies Control This Critical Supply?
HBM requires stacking 12 to 16 individual DRAM (dynamic random-access memory) dies into a single package, a manufacturing process of extraordinary precision. Only three companies in the world have qualified to produce this advanced HBM at scale: Micron Technology, SK Hynix, and Samsung Electronics.
This oligopoly has enormous implications for AI infrastructure spending. According to Gavin Baker, manager of the Atreides investment fund, HBM DRAM will account for 30 to 40 percent of all hyperscaler capital expenditure in 2027, driven by AI training and inference needs. Baker's previous Micron call has returned 14 times the initial investment, and he argues that the memory constraint will support elevated pricing and margins for these three suppliers well into the next decade.
The scale of hyperscaler spending confirms this priority. Amazon, Microsoft, Google, and Meta combined are forecasting $600 billion to $725 billion in capital expenditure for 2026, with much of it tied directly to AI infrastructure including GPUs, servers, data centers, and power systems. These companies are not pulling back on spending despite HBM's high cost; instead, they're raising guidance, signaling confidence that AI revenue will justify the investment.
How Are Hyperscalers Addressing the Memory Bottleneck?
- Vertical Stacking: Microsoft's Fairwater campus uses two-story GPU racks with through-floor networking to reduce the physical distance between chips, minimizing latency caused by memory access delays.
- Custom Networking Protocols: Microsoft developed Multi-Path Reliable Connected (MRC) alongside OpenAI and NVIDIA, a software layer that provides advanced congestion control and load balancing across commodity Ethernet hardware rather than proprietary networking gear.
- Closed-Loop Cooling: Fairwater uses a water-cooled system that circulates once at construction and never requires replenishment, allowing the facility to support 140-kilowatt GPU racks without the water consumption of traditional evaporative cooling systems.
The Fairwater campus, which cost $7.3 billion and reached full operational status in April 2026, represents the most technically integrated approach to solving these constraints. The facility links hundreds of thousands of NVIDIA GB200 Blackwell GPUs into a single coherent cluster, with each rack containing 72 GPUs connected through NVLink 5.0, a fifth-generation interconnect fabric running at 1.8 terabytes per second of GPU-to-GPU bandwidth.
Microsoft also deployed more than 120,000 new fiber miles across the United States to support its AI Wide Area Network (AI WAN), a dedicated optical fiber backbone that lets geographically separated Fairwater sites participate in the same training job simultaneously. This infrastructure investment reflects the reality that memory constraints now require not just better chips, but better connectivity between data centers.
What Does This Mean for AI Infrastructure Costs?
HBM is expensive, often representing a substantial portion of the bill of materials (BOM) for high-end GPUs. Combined with power, cooling, and networking costs, memory pricing directly affects the overall economics of AI infrastructure. However, hyperscalers are not slowing spending in response to these elevated costs. Instead, they're maintaining or raising capital expenditure guidance, indicating that AI revenue is beginning to exceed the depreciation expense of the infrastructure built to support it.
"HBM DRAM will be 30-40% of all hyperscaler capex in 2027," noted Gavin Baker, manager of Atreides Management, emphasizing that only Micron, SK Hynix, and Samsung can produce the advanced HBM packages with 12-16 dies required at scale.
Gavin Baker, Manager, Atreides Management
This shift from compute-bound to memory-bound AI infrastructure has profound implications for semiconductor supply chains and data center design. Companies that can secure reliable HBM supply and design systems that minimize memory latency will have a structural advantage in the race to build the largest, most efficient AI training facilities. The next decade of AI infrastructure will be defined not by who builds the fastest GPUs, but by who solves the memory problem most elegantly.