How Google's Custom AI Models Are Finally Making Hollywood-Quality Films Possible
Google DeepMind's approach to AI filmmaking at Tribeca 2026 revealed a crucial insight: the future of generative video in Hollywood depends on custom-trained models tailored to specific artistic visions, not off-the-shelf text-to-video tools. Rather than feeding prompts into vanilla AI models, the most compelling films showcased at the festival used fine-tuned versions of Google's Veo and Imagen models trained on custom concept art, combined with traditional animation and human creative direction.
What Made Google's Veo Approach Different at Tribeca?
Google DeepMind's short film "Dear Upstairs Neighbors," written and directed by Pixar veteran Connie Qin He, demonstrated how custom AI models can solve the visual consistency problem that plagues most generative video projects. The film tells the story of an exhausted woman trying to sleep while her upstairs neighbors create chaos. To achieve a distinct, painterly aesthetic, the production team enlisted Pixar production designer Yingzong Xin, who created concept art using Photoshop and acrylics.
The critical innovation was that DeepMind's engineers developed custom versions of Veo and Imagen specifically trained on Xin's concept art. This fine-tuning approach allowed the models to consistently generate shots that adhered to the director's vision, something vanilla AI models struggle to do. The text-to-video models excelled at reproducing stylistic details, like the way sound is visualized when objects interact with one another.
However, the production workflow reveals why AI alone cannot create compelling films. The creative team used Autodesk Maya, the industry standard for 3D animation and visual effects, to create rough animations that ensured scenes would unfold exactly as intended. Only then did they feed these roughs into Veo to create visually polished versions, which were further enhanced with additional stylized assets generated by Veo and Imagen.
How Are Filmmakers Actually Using AI Video Tools in Production?
- Custom Model Training: Rather than using off-the-shelf models, studios are partnering with AI firms like Google to build bespoke versions trained on specific concept art, visual styles, and artistic direction to maintain consistency across scenes.
- Hybrid Workflows: Filmmakers combine traditional animation software like Autodesk Maya with AI video generation, using rough animations as input to Veo rather than relying on text prompts alone to generate scenes from scratch.
- Stylistic Guidance: Custom-trained models can reproduce specific artistic details and visual aesthetics that vanilla models cannot, enabling directors to maintain a cohesive visual language throughout their projects.
- Human-Driven Creative Decisions: The most successful projects at Tribeca relied on human artists making nuanced creative choices about pacing, composition, and narrative flow, with AI serving as a tool to enhance and polish those decisions rather than replace them.
Other films at Tribeca demonstrated both the potential and limitations of different AI video approaches. OpenAI's "Smoked," a semi-autobiographical drama about the Palisades Fire, used Sora to recreate fiery scenes, though wide shots appeared somewhat cartoony while close-ups filmed using a Volume-like setup looked more convincing. "Mauvais Soleil," also from OpenAI, featured photorealistic scenes generated with Sora, but the filmmaker worked around AI limitations by keeping most shots brief and using only an unseen narrator as the speaking character.
In contrast, filmmaker Ash Koosha produced "Dreams of Violets," a docudrama about Iranian protests, spending just $2,000 on computing costs and using Kling AI, Claude, Gemini, and Nano Banana. While Koosha completed the project solo in just a few weeks, the film's visual impact remained limited despite its powerful narrative.
Why Are Studios Moving Away From Prompt-Based AI Video Generation?
The Tribeca showcase made clear that the era of studios simply feeding prompts into generative AI models to produce commercially viable content is unlikely to materialize. Most AI-generated video content currently available online is what industry observers call "slop," visually inconsistent short bursts that lack the polish and coherence audiences expect from professional entertainment.
OpenAI's recent decision to shut down Sora entirely signals a broader pivot away from consumer-facing video generation tools. The company's feature-length film "Critterz" was unable to debut at Cannes Film Festival as a result of Sora's shutdown, suggesting that even OpenAI may be reconsidering its video-focused strategy.
What seems far more likely is that major AI firms like Google will partner directly with studios to build bespoke models tailored to very specific workflows. "Dear Upstairs Neighbors" functioned as both a compelling short film and a case study in how generative AI can serve as a specialized tool that genuinely assists artists in developing their ideas. The film's entire production relied on human-made art and the kinds of nuanced creative decisions that text-to-video generators cannot make independently.
The distinction matters because it reframes the AI video debate. Rather than asking whether AI can replace human filmmakers, the more productive question is how AI can enhance human creativity when deployed with intention and customization. The films that succeeded at Tribeca were those where AI served a specific purpose within a larger creative vision, not those that attempted to automate the entire filmmaking process.