Why NVIDIA Stock Rose Even as OpenAI Cuts Its Chip Dependency in Half
NVIDIA's stock price climbed nearly 2% on June 30, reaching a $4.8 trillion valuation, even as OpenAI, one of its largest customers, announced it could run on far fewer chips and unveiled its own custom processor called Jalapeño. This apparent paradox reveals a deeper truth about how AI infrastructure is evolving: while custom chips are gaining ground in inference (running trained models), NVIDIA's grip on training (building those models) and its software ecosystem remain nearly unbreakable.
The puzzle starts with OpenAI's recent moves. Engineers at the company cut inference costs by more than half using new optimization methods, according to reporting from The Information. On June 24, OpenAI and Broadcom unveiled Jalapeño, a custom chip designed specifically for running ChatGPT and other large language models (LLMs). The company plans to deploy these chips at a gigawatt scale by the end of 2026, with Microsoft as the lead partner. Yet NVIDIA still runs most of OpenAI's inference work, and the company continues to invest heavily in NVIDIA's Blackwell chips.
Why Are Custom Chips Gaining Ground?
OpenAI is far from alone in building custom silicon. Google has manufactured its own tensor processing units (TPUs) since 2016, and Amazon developed Trainium chips. Research firm TrendForce projects that custom chips will account for 27.8% of AI server shipments in 2026, the highest share since 2023. For the first time, custom chips are expected to grow faster than NVIDIA's GPUs.
The pressure is intensifying globally. In China, Meituan trained its LongCat-2.0 model, a massive 1.6 trillion parameter system, entirely on domestic chips without any NVIDIA hardware. Sanctions on advanced chip exports have accelerated this trend, pushing companies to develop alternatives.
Yet the real story is more nuanced. Most of the competitive pressure sits at the inference layer, where models are already trained and simply need to process user queries. NVIDIA still dominates model training, where its CUDA software ecosystem has locked in developers since 2006. Custom chips rarely match that flexibility.
What Makes CUDA So Hard to Replace?
CUDA is NVIDIA's programming language and toolkit that allows developers to write code optimized for NVIDIA hardware. It has become the default language for AI development, giving NVIDIA an enormous competitive moat. However, the real advantage goes deeper than the programming language itself. According to Dylan Patel, founder of SemiAnalysis, a leading semiconductor research firm, the CUDA moat stems from how open-source AI models are deeply optimized for NVIDIA hardware.
"The moat isn't the CUDA programming language; it's the deep optimization of downstream products for NVIDIA hardware," Patel explained.
Dylan Patel, Founder at SemiAnalysis
This optimization happens across the entire AI ecosystem. Open-source models from DeepSeek, Kimi, Alibaba, and Tencent are all co-optimized for NVIDIA GPUs. When you try to run these models on Google's TPUs, performance suffers significantly. Conversely, Anthropic's models are denser and better suited to TPU architecture, while OpenAI's models are more sparse and lean toward the GPU path.
How Are Hardware, Software, and Models Working Together?
The real breakthrough in AI efficiency comes not from any single layer, but from co-optimizing hardware, software, and model architecture simultaneously. Patel argued that moving from NVIDIA's Hopper to Blackwell architecture delivered roughly a 30x improvement in inference performance for the same model. But over the past three years, overall intelligence efficiency has improved far more than 30x. Most of those gains came from the model layer, not the chips alone.
When teams co-design across all three layers, the gains multiply in unexpected ways. Instead of stacking 2x improvements from hardware, 2x from software, and 2x from models to get 8x total, co-optimization can yield 100x gains. DeepSeek's Mixture of Experts architecture is the clearest example. The company specifically tuned expert sizes to match NVIDIA's tensor-core dimensions. When that same model runs on a TPU, performance drops significantly because the architecture was never optimized for TPU topology.
- Hardware Layer: NVIDIA's Hopper-to-Blackwell transition delivered roughly 30x improvement on optimized inference over three years.
- System Software Layer: Libraries like PyTorch, custom kernels such as FlashAttention, and speculative decoding added another multiplicative factor.
- Model Architecture Layer: The shift from dense models like GPT-4 to sparse, mixture-of-experts designs produced the largest single contribution to efficiency gains.
How Is NVIDIA Defending Its Inference Position?
NVIDIA is not sitting idle. At its GTC developer conference, the company announced that its upcoming Rubin platform will cut inference costs per token by up to 10 times compared to Blackwell. Cheaper inference typically lifts usage and total compute demand, which benefits NVIDIA overall.
The company is also pursuing a strategic approach to maintain dominance. NVIDIA CEO Jensen Huang is aggressively backing emerging cloud providers and AI labs that might not seem like obvious bets. Patel explained that Huang dislikes a world where hyperscale cloud providers monopolize everything. By supporting companies like CoreWeave and Crusoe, NVIDIA creates a multipolar compute landscape. This prevents any single hyperscaler from gaining too much bargaining power and potentially developing chips that could displace NVIDIA.
"Jensen Huang is basically saying custom chips from Google, Amazon, Microsoft, and Meta struggle to compete with NVIDIA because NVIDIA is singularly focused on AI acceleration at a scale no one else matches," noted Shay Boloor, a stock analyst, on social media in February 2026.
Shay Boloor, Stock Analyst
What Does This Mean for the Future of AI Computing?
The compute landscape is shifting, but not in a way that threatens NVIDIA's core business. Yes, inference is moving toward custom chips and optimization. Yes, companies like OpenAI are building their own silicon. But training, which requires massive amounts of compute and flexibility, remains NVIDIA's stronghold. The company still sells every chip it can manufacture.
Looking further ahead, SemiAnalysis founder Patel made bold predictions about the AI infrastructure market. He forecasts that AI inference will eventually surpass oil as one of the world's largest markets, accounting for multiple percentage points of global GDP. By 2030, OpenAI and Anthropic alone could collectively possess over 100 gigawatts of compute capacity. By 2040, more than half of all new compute capacity could be deployed in space, driven by terrestrial energy cost constraints.
The real test for NVIDIA is whether its biggest customers can cut it out faster than the market grows. For now, the numbers suggest NVIDIA's dominance in training, combined with its software ecosystem and strategic partnerships, will keep the company at the center of AI infrastructure for years to come.