Why AI Agents Are Sending Enterprise Costs Skyrocketing, and How HPE Is Fighting Back
AI agents are far more expensive to operate than traditional AI systems because they continuously reason, validate, and act on decisions, consuming tokens at every step. Unlike a chatbot that answers a single question and stops, an agent keeps working, making multiple decisions and interacting with other systems. This constant activity creates a token consumption problem that's forcing major enterprises to abandon cloud-only AI strategies and build their own on-premises AI infrastructure.
What's Driving the Shift Away From Cloud AI?
For years, enterprises assumed cloud computing was the natural home for artificial intelligence workloads. But the economics of agentic AI, a category of AI systems designed to operate autonomously and make decisions over time, have upended that assumption. Hewlett Packard Enterprise (HPE) executives recently shared how their support systems process billions of operational signals daily, and as those systems became more autonomous, the token costs became unsustainable.
The problem is straightforward: every time an AI agent reasons through a problem, validates a decision, or takes an action, it consumes tokens, which are units of text that AI models process and charge for. Unlike traditional AI interactions that happen once and end, agents operate continuously. According to validation firm Signal65, agents can utilize 4 to 15 times as many tokens as standard AI chat interactions, and as autonomous agents evolve, they could push 1,000 times the inference demand of reasoning AI.
To illustrate the scale of the problem, consider OpenClaw, a widely used virtual personal AI agent. Public data shows it processed more than 600 billion tokens in a single month to support roughly 100 continuously operating coding agents, which works out to approximately $13,000 per agent per month.
How Are Enterprises Responding to Rising AI Costs?
Rather than accept these escalating expenses, enterprises are taking control of their AI infrastructure by moving workloads back on-premises and to the edge. HPE, Dell Technologies, and Cisco Systems have all launched initiatives to help organizations build what they call "AI factories," which are data centers designed specifically to run AI workloads efficiently at scale.
HPE's response demonstrates the potential savings. By building an AI-first support platform on its own infrastructure using GreenLake Intelligence, a framework of AI agents, and Private Cloud AI, an on-premises AI infrastructure engineered with Nvidia, HPE lowered costs by more than 30 times and saved nearly $100,000 per month.
"Once agents have continuous access to data, every interaction consumes a token. That includes every decision, includes some validating the decision, and includes them taking the action. Unlike traditional AI, agents don't stop after one response. They continuously reason, they continuously coordinate, and they continuously interact with other systems," said Fidelma Russo, executive vice president, president and general manager of HPE's hybrid cloud business unit and the company's chief technology officer.
Fidelma Russo, Executive Vice President and Chief Technology Officer at HPE
This shift represents a fundamental change in how enterprises think about AI economics. According to Steve McDowell, founder and chief analyst with NAND Research, the "cloud for everything" approach that seemed inevitable just two years ago is proving impractical for production AI workloads. While cloud infrastructure excels at certain AI tasks, inference, the process of running a trained model to make predictions, often works better closer to home.
What Tools Are Enterprises Using to Manage On-Premises AI?
Managing distributed AI infrastructure requires new software tools and frameworks. HPE and other vendors are introducing systems designed specifically for this hybrid environment. HPE's upcoming ProLiant DL394 Gen12 server will come with Nvidia's Agent Toolkit, which includes several key components:
- OpenShell Secure Runtime: A secure execution environment for running AI agents safely on-premises infrastructure.
- NemoClaw Blueprints: Pre-built templates and configurations that help developers design and deploy multi-agent environments without starting from scratch.
- Nemotron Models: Nvidia's specialized AI models optimized for agent-based workflows and orchestration.
Beyond hardware, HPE is also introducing software tools to monitor and control AI spending. OpsRamp Copilot for AI agents and large language models tracks utilization, token-based consumption, and costs related to agents, AI factories, and workloads. GreenLake Intelligence includes a central agent registry so organizations know not only what agents they have, but also where they are and what they're permitted to do.
How Can Organizations Optimize AI Agent Efficiency?
Beyond moving infrastructure on-premises, enterprises are discovering that memory management is critical to controlling costs. In traditional computing, memory is a technical detail. In AI, it becomes a strategic resource because rebuilding context every time an AI system is used burns through tokens and slows processes.
HPE is addressing this by integrating its Alletra storage systems with Private Cloud AI to automatically apply policies for governance and metadata. The company is using a technique called KV cache, which removes the need to rebuild context, meaning storage becomes active memory for AI. This transforms storage from a passive repository into an active part of AI efficiency.
- Context Preservation: KV cache keeps data, context, and intelligence available wherever AI needs it, reducing the time GPUs spend waiting for data.
- Throughput Improvement: In tests, HPE's Alletra MX 10000 storage system provided 20 times faster time-to-first-token and 17 times higher throughput compared to standard approaches.
- Economic Impact: By turning storage into an active part of AI efficiency, organizations can increase the amount of useful work their infrastructure performs per dollar spent.
The broader trend is clear: enterprises are discovering that AI economics increasingly resemble infrastructure economics. Success depends on utilization, efficiency, and scale, not just the quality of the AI model itself. As agentic AI becomes more prevalent, the organizations that master on-premises infrastructure and efficient token management will gain significant competitive advantages over those still relying on cloud-only approaches.