ByteDance's AI Agent Play: How Volcano Engine Captured Half of China's AI Market
ByteDance's cloud division, Volcano Engine, is dominating China's artificial intelligence market by combining ultra-low token prices with engineering efficiency, capturing nearly half of the country's model-as-a-service (MaaS) market despite intense new competition. The company's strategy hinges on three core advantages: massive scale, sophisticated inference optimization, and a growing ecosystem of AI agents that consume tokens at accelerating rates. As of March 2025, ByteDance's Doubao large language models processed 120 trillion tokens daily, more than doubling in just three months.
Why Is ByteDance Winning When Everyone Else Is Cutting Prices Too?
When Volcano Engine launched its MaaS service in May 2024, it shocked the industry by cutting token prices to 99.3% below the prevailing market rate. Competitors quickly matched those prices, yet Volcano Engine's market share didn't shrink. Instead, it grew. According to market research firm IDC, Volcano Engine's share rose from 49.2% in the first half of 2025 to 49.5% for the full year, even as nearly every major Chinese cloud vendor and AI company entered the market with aggressive pricing of their own.
The counterintuitive result reveals a fundamental truth about cloud computing economics: low prices alone don't guarantee market dominance. What matters is whether a company can sustain those prices profitably. For Volcano Engine, that sustainability comes from three interconnected advantages.
How Does Scale Create an Unbeatable Cost Advantage?
Cloud computing has a unique economic structure. Building servers, networks, research teams, and operations systems requires enormous upfront investment, but the cost of each additional token processed drops dramatically as volume increases. Volcano Engine's massive token volume allows it to spread these fixed costs across billions of requests, creating a cost structure competitors with smaller volumes simply cannot match.
Consider the difference between optimizing efficiency across 10,000 servers versus one million servers. The engineering gains compound exponentially. Volcano Engine president Tan Dai explained the principle: "Optimizing utilization by one percentage point across 10,000 servers and doing so across one million servers creates a 100-fold difference in returns. You can build a strong team to do it better".
Tan Dai
This scale advantage extends beyond simple math. Volcano Engine made token consumption its core business metric and restructured sales incentives to prioritize token volume over traditional cloud service revenue. For MaaS products with the same sales value, internal incentive weights were several times higher than for conventional cloud services. This organizational focus accelerated token volume growth, which in turn deepened the company's cost advantage.
What Technical Innovations Keep Costs Falling?
Behind Volcano Engine's low prices are sophisticated inference optimization techniques that reduce the computational resources required to generate each token. The company deployed two key technologies at scale relatively early: prefill-decode (PD) disaggregation and key-value (KV) cache management.
- Prefill-Decode Disaggregation: This technique separates the "understanding the question" phase of inference, called prefill, from the "generating the answer" phase, called decode. Each process is then matched with computing units best suited to that task, improving overall efficiency.
- Key-Value Cache: This stores historical states during model generation, avoiding repeated computation of prior context every time new content is produced. The result is lower GPU memory bandwidth consumption and reduced inference costs.
- Differentiated Pricing: Volcano Engine offers pricing based on context-length ranges, giving customers more choice and flexibility in how they use the service.
These technologies depend critically on scale. At small call volumes, maintaining complex cache and scheduling systems carries its own costs, which can offset the computing power saved. As these optimization techniques spread across the industry, token prices have gradually converged. For followers lacking Volcano Engine's economies of scale, matching low prices often means accepting greater cost pressure and potential losses. Volcano Engine, with its larger call volume, faces less cost pressure and has more room to keep optimizing, creating a sustainable competitive moat.
How Are AI Agents Reshaping the Market?
The MaaS market is undergoing a fundamental transformation. What began as a straightforward business of selling model APIs is evolving into enterprise infrastructure for AI agents, autonomous systems that can complete multiple types of work within a company. This shift dramatically increases customer stickiness and token consumption.
Token demand is already climbing rapidly. Volcano Engine's Doubao models saw daily average token usage surpass 120 trillion as of March 2025, representing a more than 1,000-fold increase from their May 2024 launch. Agent-related token consumption, while still a single-digit percentage of total token usage, is growing at an accelerating rate.
Volcano Engine launched ArkClaw, a cloud-based version of the open-source OpenClaw agent framework, to capitalize on this trend. ArkClaw offers monthly subscriptions ranging from 29 yuan (approximately $4.25) to 99 yuan, with common use cases including media generation for gaming organizations, office productivity tools, and coding assistance. The company also introduced the Ark Agent Plan, a broader subscription package supporting ByteDance's Seed series models as well as models from competitors Zhipu and Moonshot AI. This plan offers four tiers ranging from 40 yuan to 1,000 yuan per month and uses "Agent Fuel Points" as a unified unit for measuring resource consumption.
"Models are evolving to reduce inference costs and extend context windows, both crucial for the agent era," stated Li Guodong, chief architect of ArkClaw.
Li Guodong, Chief Architect of ArkClaw, Volcano Engine
ByteDance is using itself as the first customer for these products. ByteClaw, an internal agent tool connected to the company's systems, has been widely adopted by employees, with multiple agents handling workplace tasks. This workflow shift sees developers moving from writing code to setting goals and reviewing agent outputs.
What Does This Mean for the Broader AI Infrastructure Market?
China's enterprise MaaS market processed 1.944 quadrillion tokens on public clouds in 2025, marking a 16-fold year-on-year increase. IDC expects growth to accelerate further in 2026. However, the market's expansion is not benefiting all players equally. Volcano Engine's ability to maintain market share while competitors enter suggests that in infrastructure markets, scale and engineering efficiency create durable competitive advantages that price cuts alone cannot overcome.
ByteDance has signaled its commitment to this market by increasing planned AI infrastructure spending to more than 200 billion yuan this year, a figure at least 25% above an earlier proposal of 160 billion yuan. A larger share of this investment is expected to go to domestic AI chips, reflecting China's push for technological self-sufficiency in the face of international restrictions on advanced semiconductor exports.
The shift toward agentic AI also has implications for customer lock-in. When enterprises deploy AI agents across their internal systems, switching to a different cloud provider or model API becomes far more complex than changing a few lines of code. This structural stickiness transforms MaaS from a commoditized service into strategic infrastructure, fundamentally altering the competitive dynamics of the market.