Logo
FrontierNews.ai

Jim Keller's Bold Bet: Why Tenstorrent Thinks It Can Beat Cerebras at Half the Cost

Tenstorrent CEO Jim Keller has issued a direct challenge to wafer-scale chip maker Cerebras, asserting that his company's BlackHole Galaxy servers will deliver superior AI inference performance at a fraction of the cost by balancing compute, memory, and data movement more effectively than competitors. The clash between these two companies with radically different technological approaches offers a critical case study for the next phase of the AI computing race, especially as investors grow increasingly skeptical about returns on massive AI infrastructure spending.

What's the Real Difference Between These Two AI Chip Approaches?

The competition between Tenstorrent and Cerebras represents a fundamental disagreement about how to design AI inference hardware. Cerebras manufactures 12-inch wafer-scale AI chips, attempting to crush competitors with extreme single-chip compute density by abandoning DRAM (dynamic random-access memory) entirely. Tenstorrent takes the opposite approach, building distributed systems where multiple chips work together with abundant networking and balanced memory hierarchies.

Keller grounded his confidence in two classical computing theories that have guided system design for decades. Rent's Rule, originating from IBM in the 1960s, states that the input-output (I/O) required by a logic block grows slower than the amount of logic itself. This means that as you add more computing power, the communication boundary doesn't scale proportionally, creating a bottleneck. Keller argues that Cerebras' wafer-scale approach violates this principle, while Tenstorrent's architecture respects it.

"The fundamentals of AI computing are rooted in 1970s high-performance computing, and these principles have been well understood for decades," Keller stated, emphasizing that successful AI infrastructure must ultimately return to a balance between compute, memory, and I/O.

Jim Keller, CEO at Tenstorrent

The practical difference shows up in how each system handles key-value (KV) caches, which are essential for fast language model inference. In Tenstorrent's design, decoding and KV cache reside in DRAM on the same chip, enabling rapid decoding without extra processing steps. Architectures like Groq and Cerebras that rely entirely on SRAM (static random-access memory) without DRAM cannot stream data from external memory when chip count is insufficient, forcing a performance trade-off.

How Is Tenstorrent Proving Its Architecture Works in Real Deployments?

Rather than relying on theoretical arguments alone, Keller has backed his claims with concrete performance data and commercial orders. At a recent TT-Deploy event, Tenstorrent demonstrated that 16 Galaxy servers, totaling 512 BlackHole chips, could perform inference on the DeepSeek-671B large language model with a batch size of 32 and generate up to 350 tokens per second per user. For context, a token is roughly equivalent to four characters of text, so this represents substantial throughput for real-world applications.

The commercial momentum is accelerating. Keller revealed that Tenstorrent received a major order for a 96-unit Galaxy cluster, containing 3,072 BlackHole chips, destined for deployment outside the United States. The company is currently manufacturing 1,000 Galaxy servers, with at least half already sold. Ten customers have completed Galaxy system deployments and passed proof-of-concept testing, and Keller indicated that follow-on orders are beginning to arrive.

Interestingly, some customers are using Tenstorrent systems to accelerate their existing GPU clusters rather than replacing them entirely. These customers connect Galaxy systems via PCIe cards and Layer 2 Ethernet connections to boost token generation rates by one to two times. Keller noted that this represents a suboptimal use case, since customers would have achieved better economics by purchasing Tenstorrent hardware from the start, but it demonstrates the flexibility of the architecture.

Why Is Cerebras' IPO Actually Good News for Tenstorrent?

Cerebras recently completed its initial public offering and exceeded revenue expectations in its first post-IPO quarterly report. However, the company's stock plunged nearly 12 percent in a single day after guidance indicated that a massive contract with OpenAI would drag down profit margins significantly. This market reaction reflects broader investor skepticism about whether the enormous capital expenditures on AI infrastructure will generate acceptable returns.

Keller viewed Cerebras' rising valuation not as a threat but as validation of the market opportunity. He stated bluntly that Tenstorrent's lower cost structure gives it a decisive advantage in a market where customers are increasingly price-conscious. He also pointed to supply constraints at Nvidia, noting that many customers with $100 million orders cannot receive shipments for a year, forcing them to purchase $20 million Tenstorrent systems as a more affordable alternative.

"Cerebras' IPO helps us, especially because we will beat them on all fronts. Challenge accepted!" Keller declared, asserting that Tenstorrent can achieve performance superiority with large-scale BlackHole Galaxy deployments at a hardware cost far lower than Cerebras.

Jim Keller, CEO at Tenstorrent

What Are the Key Factors Driving Tenstorrent's Competitive Strategy?

  • System Architecture Advantage: Tenstorrent's Galaxy servers feature 56 Ethernet ports per chassis, compared to only 8 external ports on traditional GPU servers, enabling efficient distribution of large language model computations across hundreds of chips for parallel processing.
  • Memory Hierarchy Balance: The company's approach maintains a careful balance of DRAM, SRAM, compute resources, matrix vector operations, and on-chip networks, allowing it to respect classical computing principles like Rent's Rule that competitors are overlooking.
  • Cost Competitiveness: By delivering superior performance at significantly lower hardware costs, Tenstorrent is capturing customers who face supply constraints from larger competitors or who want to optimize their infrastructure spending in a market increasingly focused on return on investment.
  • Ecosystem Flexibility: The architecture supports both standalone deployments and integration with existing GPU infrastructure, giving customers multiple pathways to adopt the technology without stranding previous investments.

What's Next for Tenstorrent Beyond AI Inference Chips?

Keller's ambitions extend beyond AI inference hardware. The company has developed its own RISC-V CPU intellectual property, which is a royalty-free instruction set architecture that allows companies to design their own processors without licensing fees from established players. Keller confirmed he has met with CEOs from Intel and Qualcomm to pitch this hardware IP to all major hyperscale computing companies.

One hyperscale computing company is currently evaluating Tenstorrent's AI IP for manufacturing small AI chips targeting edge device scenarios. Keller also introduced Amdahl's Law, another classical computing principle, to explain emerging growth opportunities for CPU demand. The law states that the acceleration of any workload is limited by the portion that cannot be accelerated. Keller argued that agentic AI, which involves AI systems that can autonomously plan and execute tasks, is creating new CPU demand because AI has finally become fast enough that the CPU bottleneck is becoming visible.

Keller also disclosed that Tenstorrent is advancing its IPO plans, signaling confidence in the company's trajectory and its ability to compete against better-funded rivals. The combination of proven inference performance, growing commercial orders, and plans to expand into CPU IP and edge AI applications suggests that the battle between Tenstorrent and Cerebras will shape how companies deploy AI infrastructure for years to come.