The Inference Chip Race Just Got Crowded: Why SambaNova's Partnership With General Compute Could Reshape AI
A new inference cloud startup called General Compute has raised $15 million and secured $300 million worth of specialized chips from SambaNova, positioning itself as a challenger to Nvidia and Groq in the race to power AI models running in production. The company's strategy reveals a fundamental shift in how the AI industry is organizing itself: instead of relying on general-purpose graphics processing units (GPUs), companies are increasingly turning to chips designed specifically for inference, the phase when trained AI models respond to user queries.
What's the Difference Between Training and Inference Chips?
The demand for computing power to run AI models has exploded, but the computational requirements for training a model differ significantly from running it in production. Training requires massive parallel processing power to adjust billions of parameters across datasets. Inference, by contrast, prioritizes speed and efficiency when a model is actively generating responses to users. This distinction has sparked an entirely new category of specialized hardware.
General Compute's co-founders, CEO Finn Puklowski and CTO Jason Goodison, recognized that capacity at major players like Nvidia and Cerebras was becoming constrained. They turned to SambaNova, an Intel-backed chipmaker that has quietly built inference-focused processors. SambaNova's new chips, set to release this year, promise significant performance improvements over both GPUs and competing specialized chips from companies like Groq and Cerebras.
How Do SambaNova's Chips Outperform the Competition?
SambaNova's architecture offers two critical advantages that address real infrastructure challenges facing data centers. First, the chips are designed to generate between 600 to 700 tokens per second, compared to approximately 250 tokens per second for GPUs. A token is roughly equivalent to a word or small piece of text, so this speed difference means inference tasks can complete significantly faster.
Second, and perhaps more practically, SambaNova's chips are air-cooled rather than water-cooled and consume less power overall. This matters enormously for deployment because it means companies can install them in existing data center facilities without expensive infrastructure upgrades. General Compute is pursuing colocation deals, where it installs hardware in facilities owned by others, including data centers and even cryptocurrency mining operations looking to repurpose their infrastructure.
Steps to Understanding the Inference Cloud Business Model
- Chip Selection: Companies must choose between general-purpose GPUs and specialized inference chips, each with different performance and cost tradeoffs for production AI workloads.
- Infrastructure Placement: Inference clouds need to deploy hardware in existing facilities through colocation agreements rather than building new data centers, reducing capital requirements and time to market.
- Customer Optimization: Inference clouds compete on speed and cost per token, allowing customers to route requests to whichever provider offers the best combination for their specific use case.
General Compute launched its cloud offering last week and claims it is already the fastest at running MiniMax 2.7, a powerful open-source large language model (LLM). An LLM is an AI system trained on vast amounts of text data to understand and generate human language.
Why Are Venture Investors Betting on This Strategy?
Joe Hasselmann, a venture investor who backed Groq in 2021, launched a new fund called Evercrest Capital Partners focused on AI infrastructure. He made General Compute one of his first investments and sees parallels to successful partnerships in the AI ecosystem. He noted that SambaNova and General Compute are making mutual bets on each other's success, similar to how Nvidia benefited from partnerships with cloud providers like CoreWeave.
"They do need a healthy mix of customers that are going to put their chips in environments that are going to have high growth to them. As much as General Compute is making a bet on SambaNova, SambaNova is making a bet on General Compute," said Joe Hasselmann.
Joe Hasselmann, Venture Investor and Founder of Evercrest Capital Partners
The broader question underlying these investments is which computer architecture will capture the most value in AI's future. Inference clouds implicitly bet on a world where multiple AI models and agents coexist, no single provider dominates, and speed and cost of inference become the primary competitive variables. This contrasts with a scenario where one or two companies control most AI inference capacity.
What Real-World Problems Does Faster Inference Solve?
The practical implications of faster inference extend beyond raw speed metrics. Puklowski explained that the goal is to transform hour-long workloads for coding agents into five- or ten-minute tasks. For audio agents handling customer service, faster inference becomes essential because these systems need to respond quickly enough to maintain natural conversation flow. If an AI takes too long to respond, the user experience deteriorates significantly.
"If you use ChatGPT and it gives you 50 tokens per second, that's still a heck of a lot faster than we can read. Now that things have moved to agent-to-agent, where agents are out there reading on our behalf or pinging databases, they need to go faster," said Finn Puklowski.
Finn Puklowski, CEO of General Compute
The inference chip market is heating up precisely because the economics of AI deployment are shifting. As companies move beyond proof-of-concept projects to production systems serving real users, the cost and speed of inference become critical business metrics. General Compute's $15 million seed round at a $60 million post-money valuation, led by FUSE VC with participation from Carya Venture Partners and Village Global Ventures, reflects investor confidence that specialized inference infrastructure will be a significant business.
The broader AI ecosystem is also watching how this plays out. OpenRouter, a platform that lets customers access multiple AI models to optimize their spending, raised $113 million in Series B funding this week, further validating the thesis that speed and cost efficiency in inference will drive competitive advantage. As AI moves from research labs into production systems, the infrastructure layer supporting inference is becoming just as important as the models themselves.