Logo
FrontierNews.ai

Ollama 0.24 Transforms Local AI From Model Runner to Coding Agent Platform

Ollama 0.24, released on May 14, 2026, marks a fundamental shift in how developers can run AI coding agents locally. The update adds official support for OpenAI's Codex App, meaning the desktop AI coding agent can now run on any local or cloud model through Ollama, eliminating the need for a mandatory OpenAI subscription. This transforms Ollama from a tool for running language models into a platform for autonomous coding agents.

What's the Difference Between Running a Model and Running an AI Agent?

Before Ollama 0.24, the platform answered the question "how do I run a model locally?" Now it answers "how do I run an AI coding agent locally?" This is a meaningful distinction. Running a model means sending a prompt and getting a response. Running an agent means giving it a task, watching it plan, execute code, run tests, debug errors, and deliver a result without manual intervention at each step.

Codex App is a desktop application for macOS and Windows that operates as a full-fledged agent with access to your repository, terminal, and browser. Unlike IDE plugins that suggest code as you type, Codex App receives high-level tasks like "add authentication via OAuth" and writes the implementation itself, runs tests, and fixes errors autonomously.

How to Set Up Ollama with Codex App in Three Steps

  • Update Ollama: Ensure you have version 0.24.0 or newer installed. On macOS and Linux, run the installation script from ollama.com. Windows users should download the new installer from the official website.
  • Install Codex App: Download the desktop application from developers.openai.com/codex/quickstart and launch it once manually to initialize configuration files.
  • Launch with Ollama: Run the command "ollama launch codex-app" and Ollama automatically configures Codex to use its local OpenAI-compatible endpoint. You can specify a model with flags like "ollama launch codex-app --model qwen3:14b" for local models or "ollama launch codex-app --model kimi-k2.6:cloud" for cloud-based models.

The configuration persists, so subsequent launches of Codex App will use your selected model automatically. If you want to revert to the original OpenAI API setup, Ollama saves a backup that you can restore.

Why Model Selection Matters for Coding Agents

Not all language models work equally well with Codex App. The agent relies on "tool calling," a capability that lets models request actions like writing files or running commands. Weak tool calling means the agent stops mid-task or returns plain text instead of properly formatted instructions, breaking the entire workflow.

This requirement has broader implications for how AI models are being trained. Across recent releases on Hugging Face, there is a strong emphasis on agentic tool calling and long-context reasoning. If you want a model to work effectively with an agent harness, it needs to execute tool calls reliably and maintain context when those calls return large amounts of information.

The shift toward agentic capabilities is reshaping model development itself. Reinforcement learning, a training technique that teaches models new skills, is increasingly used to improve tool calling and reasoning abilities. DeepSeek R1 popularized this approach by using reinforcement learning to teach models chain-of-thought reasoning, and many recent open-source models now follow this pattern.

What's New Beyond Codex App Integration?

While Codex App support is the headline feature, Ollama 0.24 includes several other improvements:

  • MLX Memory Trace Logging: Apple Silicon users can now log memory usage for models running on Mac M-series chips, helping optimize performance on local hardware.
  • Improved MLX Sampler: The sampling mechanism for generating text on Mac M-series has been enhanced, delivering higher quality outputs from local models.
  • Reliable Updates: Fixed issues with Ollama App auto-updates, ensuring smoother version transitions for users.
  • Response Caching: The "ollama show" command now caches responses, resulting in faster startup times.

The Practical Reality of Local AI Coding Agents

While Codex App automates code generation, developers should treat its output as a draft rather than final code. AI agents excel at mechanical work like writing boilerplate, adding test coverage, and refactoring according to explicit instructions. However, architectural decisions require human judgment.

The most effective approach is to describe architectural constraints upfront. For example, specifying "use the Repository pattern, a separate service layer from the controller, and do not put business logic in entities" guides the agent toward better design decisions. Without such guidance, the agent defaults to the simplest path, which may not align with your project's architecture.

This shift toward local AI agents reflects a broader trend in infrastructure. Agent harnesses like Codex App, Claude Code, and others are changing how compute resources are allocated. These agents make dozens of requests, each generating hundreds of lines of code, creating significant demand for inference optimization. Consequently, CPUs are experiencing a resurgence in importance, with Intel Xeon processors and ARM-based chips selling faster than manufacturers can produce them.

The infrastructure implications are substantial. Cloud providers are racing to build inference-optimized systems, and companies like Nvidia have acquired specialized AI accelerator technology to handle agentic workloads more efficiently. As these agent harnesses mature and demand grows, hyperscalers may begin offloading some computational work to client devices to reduce costs.

For developers interested in local AI coding agents, Ollama 0.24 removes a significant barrier: the need for paid cloud subscriptions. The release represents a maturation of open-source tooling, enabling teams to run sophisticated autonomous coding agents entirely on their own infrastructure.