Logo
FrontierNews.ai

A New Framework Generates Personalized Videos on Demand, Boosting Ad Revenue by 1.87%

A new framework called Recommendation-as-Generation (RaG) bridges AI video creation with user preference modeling, generating personalized videos on the fly instead of recommending from a fixed pool of pre-produced content. The system was deployed at industrial scale on a platform with over 400 million daily active users, demonstrating a 1.87% improvement in ad revenue compared to traditional recommendation approaches.

How Does This New Video Generation System Actually Work?

The core innovation lies in how RaG unifies two typically separate tasks: understanding what users want and creating videos that match those preferences. Rather than relying on a fixed pool of pre-produced videos, the system generates new content tailored to individual user interests in real time. This addresses a fundamental limitation of traditional recommendation systems, which can only suggest videos that already exist, even when user interests fall outside what's available.

The framework uses what researchers call Disentangled Semantic IDs (D-SIDs) as a bridge between recommendation and generation. Think of these as digital fingerprints that capture two distinct aspects of each video: what the video is about (entities, topics, content) and how it's made (style, rhythm, atmosphere, creative choices). A recommendation model predicts which D-SIDs match a user's interests, then passes those predictions to a video generation system that creates new content aligned with those preferences.

To make this work at scale, the team developed Video Generation Agents (VGAs), which use a hierarchical planning approach rather than brute-force computation. Instead of running expensive, monolithic video generation pipelines for each user, the agents break the task into specialized steps. Three role-specialized agents handle visual composition, audio alignment, and artistic effects, all sharing a single language model backbone. A bounded reflection loop then refines cross-modal consistency, capping refinement iterations at two to balance quality with speed.

What Makes This Different From Previous Video Generation Approaches?

Previous state-of-the-art video generation models, while visually impressive, remain difficult to deploy in large-scale systems. They typically rely on manual prompting, multi-stage refinement, and post-processing with professional tools, resulting in high latency and computational cost per video. Personalizing across hundreds of millions of users with diverse, long-tailed interests would be economically infeasible using those approaches.

RaG solves this through several practical innovations. The system uses an SID-indexed cache that amortizes generation costs across similar user interests, reducing redundant computation. The shared language model backbone enables KV-cache reuse across agents, substantially accelerating inference. These optimizations allow the system to reliably serve recommendation requests for hundreds of millions of users without prohibitive computational overhead.

How to Understand the Business Impact of Personalized Video Generation

  • Revenue Improvement: Online A/B testing showed up to 1.87% improvement in ad revenue compared to production-grade generative recommendation baselines, demonstrating measurable business value beyond traditional recommendation metrics.
  • Scale of Deployment: The system was tested on an industrial-scale platform with over 400 million daily active users, proving the approach works in real-world conditions with massive user bases.
  • Closed-Loop Optimization: The framework introduces Synergistic Cross-Domain Reward Learning (SCRL), which treats user feedback as the primary objective while treating interest alignment and video quality as constraints, enabling continuous improvement as the system learns from real user behavior.

The revenue gains are particularly significant because they come from a revenue-critical advertising scenario, meaning the improvements translate directly to business outcomes rather than just engagement metrics. This suggests that personalized video generation, when properly optimized, can outperform traditional recommendation systems in practical deployment.

The research highlights a fundamental shift in how recommendation systems might evolve. Rather than treating recommendation and content generation as separate problems, RaG demonstrates that integrating them into a closed-loop system where user interests, content quality, and real-world feedback co-evolve can unlock new value. For platforms managing hundreds of millions of users, this approach offers a path to more efficient personalization without the computational burden of traditional high-cost video generation pipelines.