Why AI's Biggest Bottleneck Isn't Computing Power,It's Speed: How a $220 Million Chip Startup Is Solving the Month-Long Wait

FrontierNews.ai AI Research Desk

Why AI's Biggest Bottleneck Isn't Computing Power,It's Speed: How a $220 Million Chip Startup Is Solving the Month-Long Wait

Fractile, a London-based AI chip startup, just raised $220 million to tackle a problem most people don't realize exists: AI models are incredibly smart, but they're painfully slow at actually thinking through complex problems. Current frontier models generate roughly 40 tokens (roughly three-quarters of a word) per second on standard graphics processing unit (GPU) hardware. For advanced reasoning tasks requiring tens of millions of tokens, that means waiting weeks or even a month for a single answer. Fractile's new inference chips could compress that timeline to a single day by running 25 to 100 times faster than today's hardware.

What's Actually Slowing Down AI Right Now?

The constraint holding back AI isn't the quality of the models themselves. Today's frontier models are remarkable. The real bottleneck is architectural and surprisingly mundane: it's the wire connecting the processor to memory. Conventional AI accelerators, including Nvidia's H100, H200, and Blackwell GPUs, store model parameters in high-bandwidth memory chips physically separated from the processor. Every computation requires reading data from memory across a connection with fixed maximum bandwidth. As models have grown larger and context windows have expanded, the amount of data needing transfer per computation has outpaced improvements in memory bandwidth.

Walter Goodwin, Fractile's founder and an Oxford-trained engineer, framed the problem with mathematical precision. At 40 tokens per second, generating one million tokens takes approximately seven hours. Generating ten million tokens takes three days. Some advanced workloads, including complex multi-step reasoning and long-context document analysis, already require tens of millions of tokens. At current speeds, those workloads take weeks or months to complete.

"The technical and economic limits on inference speed, above all from memory bandwidth that has failed to scale on current architectures, are what is constraining progress," explained Walter Goodwin, founder of Fractile.
Walter Goodwin, Founder, Fractile

How Does Fractile's Solution Actually Work?

Fractile's approach eliminates the bottleneck by performing computations directly inside memory cells rather than shuttling data back and forth. Its in-memory compute architecture performs matrix multiplications inside static random-access memory (SRAM) cells alongside the compute logic, removing most of the dynamic random-access memory (DRAM) dependence that currently constrains inference cost. The result, according to Fractile's benchmarks, is chips running frontier models between 25 and 100 times faster than current GPU setups, at approximately one-tenth the cost per token.

The company's target is ambitious: 1,200 tokens per second compared to 40 today. That speed increase would compress workloads currently taking a month into a single day. Whether those numbers hold under production conditions is the key technical question the $220 million funding round is designed to answer by building and delivering the first chips.

Why This Matters for AI Companies and Customers

The Series B round was co-led by Accel, Factorial Funds, and Peter Thiel's Founders Fund, with participation from Conviction, Gigascale, O1A, Felicis, Buckley Ventures, and 8VC. Former Intel Chief Executive Officer Pat Gelsinger, who invested in Fractile in January 2025, joined the Series B as an angel investor and operating adviser.

The most commercially significant detail emerged separately: The Information reported in May that Anthropic had held discussions with Fractile regarding the purchase of inference chips when hardware becomes available in 2027. Anthropic pays hundreds of millions of dollars annually for compute to generate Claude model responses. A chip delivering inference at one-tenth the cost per token is commercially compelling at that scale. While talks are early and unconfirmed, the direction is clear.

Steps to Understanding Inference Chip Economics

Current Bottleneck: Memory bandwidth between processors and storage limits how fast AI models can generate outputs, not the speed of computation itself.
Cost Impact: Inference chips that reduce memory dependence could cut per-token costs by roughly 90 percent, making large-scale AI applications economically viable.
Speed Gains: Fractile's target of 1,200 tokens per second versus today's 40 represents a 30-fold improvement that could transform month-long workloads into day-long tasks.
Customer Demand: Companies like Anthropic that spend hundreds of millions annually on inference compute have strong financial incentives to adopt faster, cheaper hardware.

Fractile announced in February 2026 that it would invest approximately $135 million to bolster UK operations over three years, expanding its London and Bristol sites and creating a new hardware engineering facility in Bristol. The Series B funds accelerate that commitment alongside development and commercialization of the first silicon chips and compute systems for enterprise customers.

The UK government responded with characteristic official enthusiasm. AI Minister Kanishka Narayan called the deal "a strong vote of confidence in British AI," adding that it shows "UK companies at the cutting edge are pulling in global investment while anchoring high value jobs and expertise here at home." A British chip startup raising $220 million from American tier-one venture capital represents the most commercial version of the UK's national AI ambitions materializing.

The timing reflects a broader shift in AI infrastructure. Just as the early internet era's constraint was bandwidth and the industry solved it with fiber, content delivery networks, and compression algorithms, AI is approaching a similar inflection point. The constraint is no longer model quality but inference speed and cost. Fractile's in-memory compute architecture represents one approach to solving that constraint. Whether it succeeds at production scale will determine whether the company becomes a foundational player in AI infrastructure or remains a promising experiment. The $220 million bet suggests investors believe the former is far more likely.

" }

Your AI & Tech News Engine

Breaking News

Amazon Q Developer Is Shutting Down: What Developers Need to Know About the Shift to Kiro

Elon Musk's xAI Launches Grok Build to Challenge Anthropic's Coding Dominance

Elon Musk's xAI Launches Grok Build to Challenge Claude in the Coding Agent Race

xAI's Grok Build Enters the Coding Agent Wars with a Plan-First Approach

Why Waymo's Robotaxi Model Is Reshaping What Cars Will Actually Do in 2026 and Beyond

Claude Code Is Becoming the Invisible Engine Behind Major Software Projects

How Nano Nuclear's Microreactor Could Solve AI's Power Crisis Without Community Backlash

Perplexity and AI Search Engines Are Reshaping How Websites Manage Bot Traffic in 2026

Why AI's Biggest Bottleneck Isn't Computing Power,It's Speed: How a $220 Million Chip Startup Is Solving the Month-Long Wait

What's Actually Slowing Down AI Right Now?

How Does Fractile's Solution Actually Work?

Why This Matters for AI Companies and Customers

Steps to Understanding Inference Chip Economics