Logo
FrontierNews.ai

Why a Former Apple Engineer Just Raised $80 Million to Rebuild AI's Infrastructure Layer

A startup founded by a former Apple and NVIDIA engineer has just secured $80 million to solve one of enterprise AI's most expensive problems: the infrastructure layer wasn't built for AI agents that run autonomously for hours at a time. Sail Research, which emerged from stealth on June 25, is tackling a fundamental mismatch between how AI infrastructure was designed and how companies are actually deploying AI systems today.

What's the Problem With Today's AI Infrastructure?

When most people think of AI, they picture a chatbot answering a quick question. That's what today's AI serving platforms were built for: fast responses to individual requests. But enterprises are increasingly deploying AI agents that operate autonomously, reading entire codebases, screening hundreds of job candidates, or researching complex topics without human intervention. These long-running agents consume tokens at a rate 50 to 500 times higher than simple chat interactions, causing enterprise AI bills to triple even as per-token prices have fallen.

Goldman Sachs forecasts a 24-fold increase in token consumption by 2030, according to Sail's research. The problem is that existing inference platforms, which handle how AI models run on computer chips, optimize for low latency, meaning they prioritize getting you an answer fast. But that's the wrong priority for agents that need to run for hours.

How Does Sail's Approach Differ From Competitors?

Sail Research's solution is an end-to-end infrastructure platform built from the chip level up. Rather than optimizing for speed, Sail optimizes for throughput and efficiency, sacrificing real-time responsiveness to pack far more computing work into every unit of power. The company writes software that orchestrates and optimizes how AI models run on existing chips, functioning like a highly efficient traffic system that tells hardware exactly how to allocate its resources.

CEO and co-founder Neil Movva explained the deliberate tradeoff: "We only care about efficiency. It's quite difficult to build an inference engine for both throughput and latency at the same time. Everyone else is optimizing for latency, and we just care about throughput". Movva, 28, previously worked at NVIDIA, Apple, and Together AI, giving him rare expertise across every meaningful layer of the AI stack.

Neil Movva

The results are striking. Sail claims customers often see between 3x to 10x cost improvements over comparable alternatives. The company's internal benchmarking showed its inference platform achieved 90.72% accuracy on the BrowseComp-Plus evaluation while delivering costs of up to 10 times less than competing alternatives, though comparative performance may vary depending on workloads and deployment environments.

Who's Backing This Bet and Why?

Sail's $80 million funding round values the company at $450 million. Kleiner Perkins led the Series A, while Sequoia Capital led the earlier seed financing. Additional investors include Redpoint Ventures, Theory Ventures, Vine Ventures, CRV, A*, and Abstract Ventures, alongside notable angel investors.

The angel investor list underscores the credibility of Sail's thesis. It includes Alphabet Chairman John Hennessy, Intel CEO Lip-Bu Tan, and Together AI Chief Scientist Tri Dao. Aditya Naganath, the Kleiner Perkins partner who led the Series A, had been developing an investing thesis for months before meeting Movva: the next wave of AI wasn't going to be chatbots, but software that does work autonomously for hours at a time across thousands of tasks.

"It felt obvious to both of us that you're going to need a different, specific inference platform built for these long-running agents," Naganath told Fortune.

Aditya Naganath, Partner at Kleiner Perkins

Naganath's bull case is straightforward: "The belief that inference is going to be a 10x, even 100x, bigger market than it is today".

How Is Sail Already Proving Its Technology Works?

Sail launched its inference service in March 2026 and has already ramped to processing trillions of tokens per week. The company is already supporting AI-driven applications at several companies, demonstrating real-world traction.

One early customer, Detail.dev, uses Sail to run code-review agents that spend three to four hours, sometimes longer, digging through entire codebases hunting for bugs that five-minute reviews miss. Movva noted that "the abundance of tokens that we provide lets them be maximally ambitious in how they scan through code bases". Other customers include Parallel Web Systems and Jack and Jill.

Movva

Steps to Understanding Sail's Market Position

  • Competitive Landscape: Together AI, a Kleiner Perkins portfolio company, is a formidable incumbent in the inference space. However, Naganath argues the two companies serve different markets: Together owns the interactive, chat-based market while Sail owns the long-running agent workload. The larger threat may come from frontier AI labs like Anthropic, OpenAI, and Google, which are building their own inference infrastructure and could theoretically commoditize the layer Sail is betting on.
  • Market Timing: Token prices have been flat or rising for six months, demand for compute is growing faster than supply, and the world needs someone focused obsessively on squeezing the most intelligence out of every available GPU. Movva stated: "We feel an emotional pain when we see a GPU be idle or wasted in any way".
  • Technology Compatibility: Sail's platform is compatible with existing OpenAI-based development workflows while supporting a range of leading open-source models, including DeepSeek, Gemma, GLM, Kimi, and Nemotron, allowing customers to integrate Sail's infrastructure without significantly modifying existing AI applications.

What Makes Movva and His Co-Founder Uniquely Positioned?

Co-founder and CTO Samir Menon also comes from Apple, where he worked in security engineering at scale. The two met on the first day of freshman year at Stanford and took the same classes. They reunited in late 2025 to rebuild the inference stack from scratch.

Movva's background is particularly rare. He watched NVIDIA pivot from gaming chips to AI silicon in 2016 and 2017. He then joined Apple to work on the chip powering computer vision on a billion iPhones, but grew frustrated that Apple's ambition topped out at animoji, the animated characters users can apply on FaceTime. From there, he went to Together AI, one of the leading open-source model inference providers, to get back to GPU-level work. What he saw there crystallized Sail's thesis: Together had been built for interactive applications and had made every architectural trade-off accordingly. Long-horizon agents needed something built from scratch with different priorities.

Menon emphasized the infrastructure challenge: "The efficiency gains compound across every layer. Existing inference infrastructure was built around minimizing response time for individual requests rather than sustaining the high-throughput processing required by autonomous agents operating continuously across thousands of concurrent tasks".

Menon

What Does This Mean for the Broader AI Market?

Global AI spending is expected to reach $2.5 trillion in 2026, creating demand for platforms capable of supporting increasingly sophisticated AI applications. Sail's strategy focuses on improving infrastructure efficiency so organizations can deploy larger and longer-running AI workloads without the cost and scalability constraints associated with conventional inference platforms.

The funding comes as investment in artificial intelligence infrastructure continues to accelerate. Investors are increasingly recognizing that the infrastructure layer for the agent era is one of the most important bets in AI right now. Movva's stated mission is clear: "Sail exists to make intelligence abundant. Every decision we make, from the chip level to the API, is about giving teams the tokens, the scale, and the runtime to build agents without limits".