IBM's Granite Strategy Exposes the Real Problem With Enterprise AI: It's Not the Models
IBM's leadership is challenging the entire premise of enterprise AI spending: bigger models aren't always better. According to Neel Sundaresan, IBM's Automation and AI General Manager and a founding engineer of GitHub Copilot, most organizations are deploying expensive frontier AI models like Claude 3.5 Sonnet or GPT-4o on tasks that don't require their advanced capabilities. It's like taking a Ferrari to buy milk, he argues, when a more modest vehicle would do the job at a fraction of the cost.
Why Are Companies Overspending on AI Models?
The problem isn't that frontier models lack capability. These cutting-edge systems excel at complex reasoning and novel code generation. The issue is architectural: most AI coding tools default to the most expensive model for every task, regardless of complexity. Sundaresan's critique is grounded in two decades of developer productivity research, revealing that roughly 80 percent of coding work consists of routine tasks like boilerplate code generation, refactoring, test generation, and documentation. These tasks don't require 175 billion parameters or cost $15 per million tokens to process.
A smaller model like IBM's 7-billion-parameter Granite or Mistral Nemo can handle these routine tasks at 10 percent of the cost while maintaining 95 percent of the quality, according to Sundaresan. Yet most developers never see that option because their tools are designed to route everything to the most powerful model available.
How Does IBM's Task Routing Solution Work?
IBM's answer to this inefficiency is IBM Bob, an intelligent routing system that analyzes each coding task and selects the optimal model from a range of options, including Claude, Mistral, Granite, or fine-tuned specialized models. The system also adds human checkpoints for high-risk actions, ensuring quality control alongside cost savings. Sundaresan reported that 80,000 IBM developers are using Bob daily, with measurable gains in delivery velocity.
The innovation isn't the models themselves; it's the orchestration layer that matches task complexity to model capability. This approach extends beyond individual coding tasks to entire development workflows, routing planning, coding, testing, deployment, and security review to specialized agents rather than relying on a single generalist model.
Steps to Optimize Your Enterprise AI Spending
- Audit Current Model Usage: Review which AI models your organization is using for which tasks. Identify routine work like boilerplate generation, refactoring, and test creation that may be running on expensive frontier models unnecessarily.
- Implement Task-Specific Routing: Deploy an orchestration layer that analyzes task complexity and routes work to appropriately sized models. Smaller, cheaper models can handle 80 percent of coding tasks while reserving frontier models for high-value work like system architecture and novel algorithm design.
- Monitor Total Cost of Ownership: Track token spend and correlate it with business outcomes. A tool burning $10,000 monthly on routine code generation will face budget scrutiny when leadership reviews the return on investment.
- Build Specialized Workflows: Rather than deploying a single generalist agent, create specialized tools for specific workflows such as security code review, test generation with coverage analysis, or legacy codebase refactoring.
"Most AI coding is like taking your Ferrari to buy milk, deploying expensive frontier models on routine tasks where cheaper specialised models and better orchestration deliver better outcomes," said Neel Sundaresan.
Neel Sundaresan, Automation and AI General Manager at IBM
What Does This Mean for AI Startups and Investors?
Sundaresan's framing exposes a fundamental distortion in how the AI coding market measures return on investment. Agentic platforms like Devin and Cursor market the vision of autonomous engineering teams, but the economics favor specialized tools for the bulk of the workload. Enterprise buyers will adopt expensive generalists for high-value tasks, but they'll route routine work to cheaper alternatives.
This creates opportunity for startups that build the routing layer, model marketplace, or workflow orchestration platform. Companies like Replit, Sourcegraph, and Continue are already moving in this direction by routing tasks across models and adding verification steps. The compute-heavy demos that generate impressive videos but face token cost scrutiny at scale are less likely to survive long-term.
Startups can position cheaper, narrower coding assistants by owning specific workflows. A tool that excels at test generation with built-in coverage analysis and continuous integration (CI) integration beats a generalist on that task 90 percent of the time at 10 percent of the cost. Security code review agents that scan for OWASP Top 10 vulnerabilities and suggest fixes have a clear enterprise path. Refactoring specialists that optimize legacy codebases for performance and maintainability address a $100 billion market.
How Is IBM Addressing the Broader Orchestration Challenge?
Beyond the IBM Bob system, IBM is positioning its watsonx Orchestrate platform as a centralized control plane for managing AI agents across the enterprise. The platform aims to help organizations build, deploy, and manage AI agents at scale, bridging the gap between experimental AI and production-grade automation.
The shift from generative chat to autonomous agents introduces significant operational complexity. Most modern enterprise architectures are messy combinations of SaaS applications, on-premises databases, and brittle middleware. Watsonx Orchestrate aims to simplify how agents interact with these environments, but the migration cost and skills gap required to move from basic prompt engineering to complex agentic orchestration are often underestimated.
Research data shows that fewer than 20 percent of enterprises have an AI-ready data architecture, reinforcing how difficult it is to operationalize agents in existing brownfield environments. Success should be measured through outcomes like the reduction in Mean Time to Remediation (MTTR) for automated tasks and the consistency of telemetry normalization across different agent types.
IBM emphasizes openness through its Granite models, but the orchestration layer itself acts as a powerful gravity well. Competitors like Salesforce with Agentforce or Microsoft with Copilot Studio offer deep vertical integration within their respective ecosystems. IBM must win on the merit of its cross-platform neutrality, positioning itself as the orchestrator for heterogeneous software stacks rather than a destination tied to a single ecosystem.
The broader market trend is moving away from "AI as a feature" toward "AI as a workforce." The key development to watch is the emergence of the Agentic Supervisor, a new role that combines traditional IT operations with AI ethics and oversight. IBM is attempting to claim the high ground in this supervisor category before it becomes commoditized by hyperscalers.
The success of IBM's orchestration platform will ultimately depend on whether it can lower the cost to operate for complex workflows and deliver enough incremental value to justify additional licensing and architectural overhead. The next three quarters will reveal whether enterprises are ready to invest in a dedicated orchestration layer or if they will wait for their existing providers to offer these capabilities natively.