Video AI Is Splitting Into Two Camps: Here's Why That Matters for Developers

FrontierNews.ai AI Research Desk

Video AI Is Splitting Into Two Camps: Here's Why That Matters for Developers

The video AI market is no longer a single race to build the best generative model. Instead, it's fracturing into two distinct categories: companies competing to generate new video from scratch, and companies building tools to search, understand, and reason over footage that already exists. Two major announcements this week underscore how this split is reshaping where investors are placing their bets and what developers actually need to build products faster.

What's Driving the Split Between Video Generation and Video Understanding?

TwelveLabs, a San Francisco and Seoul-based startup, just raised $100 million in Series B funding to scale its video understanding models, Marengo and Pegasus, which index and query existing footage across up to two hours of continuous context. Unlike OpenAI's Sora, Google's Veo, or Runway, which generate entirely new video, TwelveLabs treats video understanding as a fundamentally different problem. The company's CEO and co-founder Jae Lee framed the bet as a wager that the intelligence layer composing underlying models will retain value as the models themselves commoditize.

"The goal is durable intelligence layer value as underlying models commoditize," noted CEO Jae Lee, who previously worked as a data scientist for South Korea's Ministry of National Defence.
Jae Lee, CEO and co-founder of TwelveLabs

The funding round, co-led by NEA and NAVER Ventures with participation from Amazon, Radical Ventures, Korea Investment Partners, Index Ventures, Quadrille Capital, and Red Bull Ventures, brings TwelveLabs' total funding to roughly $150 million. The company has grown to approximately 178 employees as of June 2026, up from about 58 a year earlier, signaling rapid scaling in response to enterprise demand.

How Are Cloud Providers Reshaping the Video AI Landscape?

One of the most consequential details in TwelveLabs' Series B may not be the funding amount itself, but rather who wrote the check and what they got in return. Amazon, already a repeat investor in the company, used the round to formalize AWS as TwelveLabs' preferred cloud provider, with new models optimized for AWS Trainium chips launching there first. This mirrors a broader pattern emerging across AI infrastructure deals in which cloud providers trade investment dollars for locked-in compute commitments and product roadmap influence rather than pure equity upside.

For teams evaluating video AI vendors, this dynamic matters as much as the funding total itself. The arrangement signals that cloud providers are increasingly willing to invest capital to secure long-term compute spend and influence over which startups get preferential access to accelerator hardware. Amazon reportedly structured a similar arrangement with AI video lab Odyssey earlier in 2026, suggesting this is becoming a standard playbook.

What Does the Unified API Approach Mean for Developers?

Meanwhile, a different kind of consolidation is happening on the developer tooling side. Pollo AI launched Pollo API, a unified API platform that gives developers access to more than 300 AI video and image models behind a single endpoint. Rather than compete directly with generative video leaders, Pollo AI is positioning itself as a neutral aggregator, allowing developers to choose the right model for each use case without managing separate provider integrations.

The platform supports model families including Veo, Kling AI, Sora, GPT Image, Nano Banana, Runway, Hailuo, and Pollo's own in-house Pollo 2.0 model, plus multiple versions and variants of each. Bill Zhu, CEO of Pollo AI, explained the reasoning behind the product.

"Developers often want the freedom to choose the right model for each project without managing multiple integrations. Pollo API was built to make that process simpler and more efficient," said Bill Zhu.
Bill Zhu, CEO of Pollo AI

Pollo API supports a range of workflows including generation, editing, enhancement, and effects, with API key access, task-based generation, status polling, logs, webhooks, and direct USD pricing. The platform lists per-model pricing, for example $0.06 per second for Pollo 2.0 and $0.066 per second for Kling 3.0.

How to Evaluate Video AI Tools for Your Project

Clarify Your Use Case: Determine whether you need to generate new video from scratch or search, index, and reason over existing footage. Generation-focused tools like Sora and Veo serve different needs than understanding-focused platforms like TwelveLabs' Marengo and Pegasus models.
Assess Integration Overhead: If you plan to experiment with multiple models, unified API platforms like Pollo API reduce engineering effort by eliminating the need to maintain separate provider integrations. However, validate latency, reliability, and real-world cost under your own workloads rather than relying on vendor claims.
Consider Cloud Lock-in: If you choose a startup backed by a cloud provider, understand that your compute spend may be tied to that provider's infrastructure. Evaluate whether that arrangement aligns with your long-term architecture and cost optimization strategy.
Monitor Output Consistency: When using aggregated APIs spanning dozens of models, test how output quality, latency, and cost vary across model versions. Vendor-published figures should be validated before production adoption.

Why This Split Matters for the Broader AI Video Market

The divergence between generation-focused and understanding-focused tools signals that video AI is maturing into distinct product categories rather than remaining a single competitive space. Industry estimates cited in reporting on TwelveLabs put the AI video search market at roughly $3.2 billion by 2028, suggesting substantial enterprise demand for indexing and reasoning over unstructured video, including surveillance archives, sports libraries, and broadcast tape.

For practitioners, this split underscores a widening strategic choice: companies building generative video tools are competing on model quality and speed, while companies building understanding and retrieval tools are competing on the ability to extract value from footage that already exists. As underlying generative models become more commoditized, the intelligence layer that reasons over video data may prove more defensible and valuable in the long term.

The funding announcements also reveal how cloud providers are using investment as a lever to secure compute commitments and influence product roadmaps. Teams choosing between video AI vendors should account for these dynamics alongside technical capabilities and pricing when evaluating long-term partnerships.

Your AI & Tech News Engine

Breaking News

How AI Models Are Learning to Specialize: The Fitness Coach Case Study

Europe's AI Boom Hits a Wall: Why Power, Not Chips, Is Now the Real Bottleneck

Tesla's $200 AI Spending Cap Reveals the Real Problem With Musk's AI Empire

Jensen Huang's Iconic Leather Jacket Is Heading to Auction for Charity

Anthropic's Models Are Back Online, But U.S. AI Policy Remains Dangerously Unpredictable

OpenAI's 5% Government Stake Proposal: What It Means for AI's Future

Elon Musk Admits Tesla's Optimus Robot Won't Be Ready Anytime Soon, Despite Record Car Sales

Google's Gemini Omni Brings AI Video Generation to Your Phone,Here's What You Need to Know

Video AI Is Splitting Into Two Camps: Here's Why That Matters for Developers

What's Driving the Split Between Video Generation and Video Understanding?

How Are Cloud Providers Reshaping the Video AI Landscape?

What Does the Unified API Approach Mean for Developers?

How to Evaluate Video AI Tools for Your Project

Why This Split Matters for the Broader AI Video Market