Google's Interactions API Becomes the New Standard for Building AI Agents and Multimodal Apps
Google has officially launched its Interactions API as the primary way developers build applications with Gemini models and AI agents, moving beyond the older generateContent API. The unified endpoint, which reached general availability on June 22, 2026, now includes major new features like managed agents that can reason and execute code, background execution for long-running tasks, and multimodal generation capabilities including image, music, and speech synthesis.
What Changed Since the API's Public Beta Launch?
When Google first introduced the Interactions API to the public in December 2025, developers quickly embraced it as their preferred way to work with Gemini. The general availability release brings stability and several capabilities that the developer community specifically asked for. The API now features a stable schema, meaning developers can rely on consistent behavior as they build production applications.
The most significant additions include managed agents, which provision a remote Linux sandbox where an agent can reason, execute code, browse the web, and manage files. Google ships the Antigravity agent as the default option, but developers can define their own custom agents with specific instructions, skills, and data sources. Background execution allows developers to set background=True on any call, letting the server run interactions asynchronously without blocking user-facing operations.
How to Get Started Building With the Interactions API
- Unified Endpoint: Pass a model ID for inference or an agent ID for autonomous tasks in just a few lines of code, simplifying the development process compared to managing separate API calls
- Tool Mixing: Combine built-in tools like Google Search and Google Maps with custom functions in a single request, with tool results now able to return images alongside text for richer responses
- Media Generation: Generate images with Nano Banana 2 and Google Image Search grounding, create music with Lyria 3, and produce expressive speech with multi-speaker text-to-speech capabilities
- Cost Optimization: Choose between Flex and Priority tiers to optimize for either cost reduction (Flex offers 50% savings) or lower latency depending on your application needs
The API also introduces a simplified schema called "From Roles to Steps," where every action like user input, model reasoning, function calls, and outputs becomes its own typed step. This replaces the previous role-based structure and makes it easier for developers to understand and debug their applications.
Deep Research, Google's agent for conducting in-depth investigations, has received significant upgrades. The new version includes two agent variants optimized for speed versus depth, collaborative planning features, native charts and infographics, and multimodal grounding that works with images, PDFs, and audio files.
"The Interactions API is now the default for Google AI Studio, the Gemini API, and all our documentation, which includes a toggle to switch snippets back to the legacy format. We recommend using the Interactions API for all new projects and applications," stated Ali Çevik, Group Product Manager at Google DeepMind.
Ali Çevik, Group Product Manager, Google DeepMind
What Happens to the Older API?
Google is not abandoning the legacy generateContent API. The company confirmed that it will remain fully supported and continue receiving new mainline Gemini models for the foreseeable future. However, frontier capabilities for long-running models and agents will increasingly launch exclusively on the Interactions API, since it was designed from the ground up for stateful, agentic workflows.
For developers currently using generateContent, Google has published a migration guide that maps every field to the new schema, allowing teams to transition at their own pace. The Interactions API is available through Python and JavaScript SDKs, and developers can also access it through ecosystem partners like LiteLLM, Eigent, and Agno.
To help agents stay current with the latest API patterns, Google built the gemini-interactions-api Skill, which injects best-practice development patterns into an agent's context. This includes guidance on streaming, function calling, structured output, and Deep Research capabilities.
The timing of this release reflects broader industry momentum around multimodal AI, which combines audio, visual, and text understanding. As speech and language technologies advance, the ability to seamlessly integrate audio-visual processing into applications becomes increasingly important for developers building conversational AI, video analysis tools, and intelligent agents.