Why Cloud Native Infrastructure Is Becoming Essential for AI Agents
AI agents operating in production environments require fundamentally different infrastructure than the frameworks used to build them. Rather than treating agents as software modules that run inside a single application, leading organizations are deploying each agent as an independent workload on Kubernetes, the open source container orchestration platform. This architectural shift is reshaping how enterprises think about agent safety, observability, and compliance.
What Makes Cloud Native Different for AI Agents?
The traditional approach to building multi-agent systems treats all agents as components within one process. This works fine for prototypes and demos on a laptop, but it creates a critical vulnerability in production: if one agent gets stuck waiting for a model API response or enters an infinite loop, it can drag down every other agent in the system. By contrast, deploying each agent as its own Kubernetes Deployment, with its own resource limits, identity, and restart policy, isolates failures and allows the system to continue operating even when one agent encounters problems.
Orange Innovation, a division of the French telecommunications company Orange, has built an internal real-time security operations platform that demonstrates this approach at scale. The system uses a Coordinator Agent built with LangGraph, an open source framework for building agent reasoning loops, to orchestrate four specialized agents: one that detects anomalies, one that analyzes threats, one that executes remediation actions, and one that sends notifications. Each agent runs independently, communicating through a protocol called A2A, which was open-sourced in 2025 and is now governed by the Linux Foundation.
How Do Organizations Keep Agents Safe and Accountable?
Safety constraints for AI agents cannot rely on prompt engineering alone. Instead, leading organizations are codifying safety rules as policy-as-code, using tools like Open Policy Agent (OPA) and Kyverno, which are Kubernetes-native policy engines. In Orange's system, a reviewer agent evaluates whether a proposed action is safe to execute by consulting these policies, receiving a deterministic verdict rather than relying on the language model's reasoning about safety.
This shift from prompt-based safety to policy-based safety is critical for compliance. Regulators and auditors are moving beyond asking "did you have the right access controls?" to asking "what did your agents actually do, and how do you know it was appropriate?". A defensible audit trail requires both execution observability, which captures what the agent did, and intent observability, which captures why it did it. Most enterprise programs currently deliver only the first.
The regulatory landscape is accelerating. In February 2026, the National Institute of Standards and Technology (NIST) formally launched its AI Agent Standards Initiative, establishing a three-pillar program to standardize agent security, interoperability, and identity. This was the first time NIST treated agentic AI as a distinct standardization priority. The European Union's AI Act, which came into force in 2024, creates specific obligations for high-risk AI systems, including requirements for transparency, human oversight, and audit trail maintenance directly applicable to enterprise agent deployments.
Steps to Build Production-Ready Agent Infrastructure
- Deploy agents as independent workloads: Each agent should run as its own Kubernetes Deployment with isolated resource limits, identity credentials, and restart policies. This prevents one agent's failure from cascading to others and allows for independent scaling and updates.
- Implement mTLS for inter-agent communication: Use certificate-based mutual authentication at the transport layer rather than relying on a service mesh. Tools like cert-manager can issue per-agent identities, and network policies can restrict which agents can reach which services.
- Codify safety constraints as policy-as-code: Move safety rules out of language model prompts and into version-controlled policy files using OPA or Kyverno. These policies should be code-reviewed, unit-tested, and auditable like any other infrastructure-as-code artifact.
- Propagate trace IDs through all agent communications: Every task should carry a unique trace ID that flows through the entire reasoning chain. This allows operators to reconstruct the complete decision-making process for any action, which is essential for both debugging and compliance investigations.
- Use classical machine learning to pre-filter before invoking language models: Place a lightweight anomaly detection model in front of the language model tier to reduce unnecessary LLM invocations. Orange's system uses an Isolation Forest model that scores events in microseconds, reducing token costs and latency.
Orange's architecture includes a human-in-the-loop mechanism by design, not as an afterthought. Every consequential decision has three possible outcomes: auto-execute, auto-reject, or escalate to a human analyst. The decision to escalate is itself a deterministic policy verdict, not a cultural practice. Escalation triggers when reviewer confidence falls below a threshold, when the affected asset is on an always-escalate list such as control-plane components or customer-facing systems, or when the proposed action would exceed a configured blast radius.
What Are Enterprises Actually Doing With Agents Today?
While regulatory frameworks are still being finalized, enterprises are already deploying agents at scale. DBS, one of Asia's largest banks, created 10,000 personal agents in just a few months after the capability became available in late 2025. These personal agents handle tasks like curating news and market information for executives. The bank is also trialing team agents that help analysts compare financial data from multiple sources to set deposit and mortgage rates, a task that previously required manual data collation.
DBS is implementing what it calls "harness engineering," a governance framework that ensures agents adhere to policies, governance frameworks, and standard operating procedures. The bank maintains an agent registry with visibility into all deployed agents, their identities, and their capabilities. It has also established safety guardrails that provide accountability, observability, traceability, and evaluation of agent behavior.
"We have to be aware that a dystopian view is also very much a possibility. That's why we have a governance framework and set up a control plane," said Tan Su Shan, CEO of DBS.
Tan Su Shan, CEO at DBS
The most significant impact is expected in enterprise agents that handle complex workflows. DBS is working on agent deployments across 11 areas, including wealth management, small and medium-sized enterprise banking, institutional banking, risk management, and legal and compliance. One early success is in credit memo writing within institutional banking. Relationship managers previously spent the bulk of their time reading market reports, analyzing financial statements, and assessing creditworthiness. Now, a network of up to 70 agents can produce a draft credit memo that the manager reviews before making the final decision. This frees relationship managers to spend more time meeting with existing clients and prospecting for new business.
What Compliance Metrics Matter Most?
For board-level reporting on agent risk, three metrics have proven particularly effective for translating technical architecture into business outcomes:
- Least agency ratio per agent class: Tracks the gap between what each agent class can access and how tightly its autonomous decisions are constrained. This metric surfaces specific deployments where behavioral constraints are not keeping pace with permission scope.
- Five-signal coverage percentage: Measures the share of deployed agents monitored across all five signal domains. An agent monitored on fewer than five signals has coverage gaps that sophisticated attacks already exploit.
- Step mutation intervention rate: Measures the percentage of flagged agent actions that were rewritten in-flight rather than blocked or allowed to pass. This reflects the maturity of the security program's response capability and shows whether the organization can do more than simply block or allow actions.
Organizations that use existing compliance requirements as a foundation, building governance that satisfies current obligations while anticipating future ones, will be better positioned than those who treat compliance as a ceiling. The direction is clear: behavioral monitoring is becoming a baseline requirement, not an advanced capability.
The convergence of cloud native infrastructure, policy-as-code governance, and regulatory frameworks is creating a new category of production-ready agent systems. Organizations that invest in this foundation now will find it easier to scale agents safely and maintain compliance as regulations become more specific. Those that delay will face the challenge of retrofitting governance and observability into systems that were not designed for it.