Apple's New Foundation Models Fit a 20 Billion Parameter AI Into Your iPhone
Apple has unveiled a new generation of foundation models that bring sophisticated artificial intelligence directly to iPhones and other devices, marking a significant shift in how on-device AI operates. The Apple Foundation Models (AFM), announced at Apple's WWDC26 developer conference in June 2026, consist of five foundational models developed in collaboration with Google. The standout achievement is the AFM 3 Core Advanced, which runs a 20 billion parameter multimodal model natively on iPhones, enabling features like expressive voice input and high-precision voice recognition for a rebuilt Siri.
How Do These Models Fit Massive AI Into Your Phone?
The technical breakthrough behind fitting such a large model onto consumer hardware involves a clever architectural innovation. Instead of storing the entire 20 billion parameter model in a device's active memory (DRAM), which would be impossible on current iPhones, Apple uses what it calls a "sparse active architecture." This approach stores the complete model in flash memory (the same storage used for apps and photos) and selectively loads only the parameters needed for each task into active memory.
The AFM 3 Core Advanced uses a technique called Instruction-Following Pruning (IFP) to determine which parameters to activate based on your input. Once activated, these parameters stay fixed in memory, effectively transforming the 20 billion parameter model into a dense model with 1 to 4 billion active parameters. Apple states this "minimizes latency while enabling model scales far exceeding traditional DRAM limitations," meaning you get powerful AI responses without noticeable delays.
Apple
What Are the Five Apple Foundation Models?
Apple's new foundation model family includes both on-device and cloud-based options, each optimized for different tasks:
- AFM 3 Core: A lightweight on-device model with 3 billion parameters, designed for basic tasks that don't require maximum power.
- AFM 3 Core Advanced: Apple's most powerful on-device model with 20 billion parameters, natively multimodal to handle text, images, and voice simultaneously.
- AFM 3 Cloud: The flagship server-side model optimized for speed, efficiency, and performance when your device connects to Apple's servers.
- ADM 3 Cloud (Image): A specialized model for image generation and editing, powering new photo tools and a feature called Image Playground.
- AFM 3 Cloud Pro: Apple's most powerful server-based model, supporting demanding applications like agent-based tools and complex reasoning tasks.
All models except AFM 3 Cloud Pro are specifically designed for Apple Silicon, the custom processors that power iPhones, iPads, and Macs. For AFM 3 Cloud Pro, Google and NVIDIA collaborated to ensure high performance while protecting user privacy through Apple's Private Cloud Compute system.
How Does Apple's Privacy Approach Differ?
Apple emphasizes that these models are trained without using your personal data or interactions. The company states it does not use "users' personal private data or interactions with users to train our base models." Instead, the models are trained using publicly available information, licensed or purchased data from third parties, open-source data, data from dedicated research, and synthetic data. Apple also respects web publishers' rights to opt out of having their content used for model training.
Apple
The server-side models benefit from Apple's Private Cloud Compute infrastructure, which the company designed to process sensitive requests on privacy-protecting servers. This system has achieved improvements in multimodal inference capabilities, including better training stability and enhanced ability to recall information within complex context windows for server-side queries.
What Does This Mean for Developers?
Apple has made the Apple Foundation Models framework publicly available, allowing developers to build apps using these models. Google has integrated its Gemini models into the framework, enabling Apple's on-device models and cloud-hosted Gemini models to work through a common API interface. This means developers can easily switch between local and cloud inference depending on their app's needs.
Apple also released "Core AI," a framework designed as the optimal method for running on-device models within apps. Core AI is built into the operating system and maximizes the performance of Apple Silicon. Developers can call Gemini from Apple's development environment, Xcode, to receive development support. On macOS, developers can use the Foundation Models SDK for Python to integrate with commonly used tools and evaluation packages.
However, some developers have noted gaps in Apple's profiling tools. Marco Abis, who develops the local AI profiler "Ziraph" on Apple Silicon, pointed out that while Core AI's profiling tools publish timing information, they do not publish energy, memory bandwidth, and thermal data. Abis noted this is "a major omission considering these metrics greatly influence a device's performance".
Abis
What's the Relationship Between Apple and Google's Models?
While Apple has not explicitly detailed how its models relate to Google's Gemini family, technology media outlet MacStories has speculated based on available information. The outlet suggests that AFM Cloud may be based on Gemini 3.1 Flash-Lite, AFM Cloud Pro on Gemini 3.5 Flash, and ADM Cloud on a model called Nano Banana Pro. However, Apple has not confirmed these specific relationships, and the exact technical connections remain somewhat unclear.
What is clear is that the collaboration represents a significant partnership between two major technology companies to advance on-device AI capabilities. By integrating Gemini models into Apple's framework, both companies are working toward a future where powerful AI runs efficiently on consumer devices without requiring constant cloud connectivity.