Why AI Companies Are Building Data Centers Everywhere But Still Can't Keep Up
The AI infrastructure crunch is forcing a fundamental rethinking of where and how computing happens. Two years ago, the industry obsessed over training massive language models. Today, the real bottleneck is running those models in production, and it's reshaping capital spending across the entire technology sector.
Why Is Inference Suddenly More Expensive Than Training?
Training an AI model is a one-time, intensive effort. You spend weeks or months, burn enormous amounts of electricity, and then you're done. Inference, by contrast, is continuous. Every time an employee uses a corporate AI assistant, every automated workflow an AI agent completes, every customer interaction with a chatbot, inference happens on a server somewhere. The meter never stops running.
According to recent research from Deloitte, inference operations will account for roughly two-thirds of all AI compute by 2026 as production scales up. That's a massive shift in how technology companies allocate their budgets. The cost of running AI applications is scaling significantly faster than the raw hardware costs are dropping, making enterprise customers increasingly sensitive to the efficiency of their cloud providers.
To make matters more complex, advanced reasoning techniques like test-time scaling and long thinking will dramatically increase the compute required for each individual inference request over the coming years. A simple prompt will trigger a cascade of background computations. Enterprise AI is getting much heavier.
What Infrastructure Changes Are Companies Making Right Now?
The shift from training-focused data centers to inference-optimized ones is already underway. Vultr, a major cloud provider, has partnered with Hewlett Packard Enterprise (HPE) and NVIDIA to build next-generation AI infrastructure specifically designed for production workloads. The deployment focuses on accelerating production inference by integrating advanced liquid-cooled hardware and high-speed networking.
The technical specifications reveal how serious this pivot is. The new systems use NVIDIA's GB300 NVL72 architecture, supplied and integrated by HPE, with liquid cooling technology to manage the immense heat generated by high-density server configurations. The networking backbone relies on NVIDIA Spectrum-X Ethernet, utilizing 400 gigabit and 800 gigabit interconnects. High-speed networking is no longer a luxury; it's a fundamental requirement to keep processing units fully utilized and minimize idle time.
Liquid cooling represents a pragmatic response to the physical constraints of modern data centers. Traditional air cooling is rapidly reaching its limits. As rack power densities climb to unprecedented levels, liquid cooling is transitioning from a niche high-performance computing solution into standard operating procedure for AI clouds.
How Are Companies Addressing the Land, Power, and Cooling Crisis?
The infrastructure challenge extends beyond what traditional data centers can handle. Physical constraints including land scarcity, power availability, and cooling pressure are driving interest in alternative approaches. Some companies are exploring a radical solution: moving compute infrastructure into orbit.
SpaceX's recent initial public offering (IPO) positioned orbital data centers as a serious investment opportunity, transitioning the idea from speculative engineering to a public market growth thesis. The company's IPO pitch extended beyond reusable rockets and Starlink connectivity to position SpaceX as an AI infrastructure company with ambitions to build orbital AI compute satellites. SpaceX expects to begin deploying orbital AI compute satellites as early as 2028, with executives telling investors the company is aiming to launch initial demonstrator systems by late 2027.
The appeal of orbital infrastructure is straightforward: space-based compute could reduce pressure on terrestrial land, cooling, and power constraints while leveraging solar power in orbit and satellite connectivity. However, the economics remain unproven. The market will not be decided by whether "data centers in space" sound compelling. It will be decided by which workloads can justify launch costs and operational complexity.
What Workloads Make Sense for Orbital vs. Terrestrial Infrastructure?
Not all AI workloads are created equal. Different applications have different bandwidth, latency, maintenance, hardware refresh, and governance requirements. Core enterprise systems like ERP, finance, supply chain, and HR will remain terrestrial for the foreseeable future because they require low-latency access, integration with enterprise applications, predictable support models, and regulatory clarity.
The stronger early market for orbital compute will likely come from workloads where location, resilience, power access, or geopolitical risk outweigh the added complexity of operating infrastructure in orbit. These include:
- Inference at Scale: Large-scale AI inference workloads that can tolerate slightly higher latency in exchange for lower power and cooling costs.
- Satellite-Native Edge Processing: Data processing tied directly to space-based networks and satellite operations.
- Sovereign or Resilient Storage: Workloads requiring geopolitical independence or resilience against terrestrial infrastructure failures.
- Defense and Intelligence Use Cases: Applications where location and resilience outweigh operational complexity.
Why Chip Supply Remains the Universal Constraint
Moving compute into orbit does not eliminate the need for advanced processors, memory, power systems, and specialized hardware. It simply changes where that infrastructure runs. Semiconductor production remains a universal constraint whether chips are deployed on Earth or in space.
SpaceX's broader hardware strategy is relevant here. The company's IPO filing referred to Terafab, a Tesla-Intel chip-making initiative intended to ease future chip shortages for SpaceX. Orbital data centers may address some power and cooling constraints, but they do not escape the supply chain limits shaping AI infrastructure on Earth. The same chip, manufacturing, and deployment bottlenecks still apply, with additional space-grade engineering requirements layered on top.
How to Evaluate Your Organization's AI Infrastructure Needs
Enterprise technology leaders should approach the infrastructure expansion strategically rather than assuming one solution fits all workloads. Here are key considerations:
- Latency Requirements: Determine whether your workload can tolerate the latency inherent in orbital or distributed terrestrial infrastructure, or whether it requires the low-latency access of traditional data centers.
- Data Gravity and Integration: Assess how tightly your AI workloads are coupled with existing enterprise applications and whether they require real-time data synchronization with on-premises systems.
- Operational Maturity: Only 21% of organizations have fully implemented AI operations for core IT functions, according to HyperFRAME Research. Evaluate whether your organization has the operational expertise to manage specialized inference infrastructure before scaling.
- Workload Discipline: Separate your AI use cases by their specific requirements rather than assuming all inference workloads have identical infrastructure needs.
- Cost Sensitivity: Calculate whether the efficiency gains from specialized infrastructure justify the operational complexity and potential latency trade-offs for your specific workloads.
The infrastructure landscape is fragmenting. Traditional cloud providers are building specialized inference-optimized data centers with advanced cooling and networking. Space companies are exploring orbital alternatives for specific use cases. Meanwhile, the fundamental constraint remains unchanged: advanced semiconductor supply limits how much compute capacity anyone can deploy, whether on Earth or in orbit.
The companies winning this infrastructure race are those that acknowledge the new reality of enterprise computing. Customers are demanding infrastructure that is purpose-built for AI inference rather than adapted from older legacy systems. The partnership between Vultr, HPE, and NVIDIA acts as an ambitious blueprint for this new standard in cloud architecture, integrating computing, networking, memory, and cooling seamlessly from the ground up.
" }