Logo
FrontierNews.ai

Why AI Agents Need Audit Trails Before They Need Features

AI agents are moving from experimental pilots to production infrastructure, but the legal and compliance professionals building them are prioritizing foundations over features. At FutureLaw 2026 in Tallinn, a convergence of sessions on trust, legal operations, and infrastructure revealed a single procedural principle: invest in architecture and observability before rolling out autonomous capabilities.

What Makes Lawyers Trust AI Agents in Production?

The trust problem for AI agents isn't primarily technical; it's structural. When organizations deploy autonomous systems that make decisions or take actions without human approval at each step, the question shifts from "Does this AI work?" to "Can I prove what this AI did and why?".

"When we have a new actor in our environments, we have a new risk. People user doesn't equal AI agents. We have to focus on how preparing the infrastructure strictly dedicated to agent activity," said Marek Laskowski, Chief Information Officer at the Polish firm DZP.

Marek Laskowski, Chief Information Officer at DZP

Laskowski's point cuts to the heart of the infrastructure challenge. AI agents aren't just tools that humans use; they're actors inside a firm's systems that can initiate workflows, access data, and interact with external services. That requires a different security and compliance posture than traditional software.

Jamie Tso, founder of LegalQuants, argued that traceability is the foundation of trust. When a lawyer can trace an AI agent's output back to source text, paragraph by paragraph, confidence rises. But the real shift in thinking came from Damien Riehl, a solutions champion at Clio, who reframed the entire conversation: instead of humans "in the loop" approving every action, organizations should move to humans "on the loop," watching the reasoning and intervening only when necessary.

How to Build AI Agent Infrastructure That Regulators and Lawyers Will Accept

  • Audit Trail Architecture: Design systems that capture every decision an AI agent makes, the data it accessed, and the reasoning behind each action. This is non-negotiable for regulatory compliance under frameworks like the EU's NIS2 directive and DORA (Digital Operational Resilience Act).
  • Third-Party Verification: Don't rely on vendor declarations that data won't be used for model training. Verify third-party landscapes against actual architecture diagrams, not marketing claims. Laskowski emphasized that many providers claim data safety, but independent verification is essential.
  • Infrastructure-First Deployment: Before adding agent features, establish dedicated infrastructure for agent activity. This includes monitoring, logging, and isolation from human-operated systems.
  • Observability and Monitoring: Build systems that let humans watch agent reasoning in real time, not just review outputs after the fact.

The legal operations teams surveyed at FutureLaw 2026 reported a trust score of 6.6 out of 10 for AI output, with three repeating frustrations: too many tools, IT departments restricting which tools can be used, and hallucinations, including AI systems answering in Russian when prompted in Latvian or Estonian.

Why Pilot Design Matters More Than Model Performance

Joe Cohen, a legal innovation partner at Harvey and former director of advanced client solutions at Charles Russell Speechlys, outlined a framework for AI agent pilots that sidesteps the hype cycle. Pilots should run between two and 12 weeks, include practice-area and seniority-balanced cohorts, and capture data granular enough to distinguish short-form drafting from long-form drafting and template-based work from prompt-based work.

This specificity matters because it prevents organizations from making broad claims like "AI agents increased productivity by 30%" when the actual impact varies dramatically by task type. A template-based task might see 80% time savings, while complex legal analysis might see 10% or even negative returns if the AI agent's output requires extensive human review.

"If one doesn't know to which port one is sailing, the wind is not favorable," said Mori Kabiri, founder of Legal Operations KPIs, paraphrasing Seneca to argue that legal departments should start with business outcomes, not technology features.

Mori Kabiri, Founder of Legal Operations KPIs

The mistake most organizations make is leading vendor conversations from feature sets rather than from a defined business outcome with a baseline key performance indicator (KPI). This approach inverts the decision-making process and leads to expensive pilots that prove a tool works but don't prove it solves a real problem.

What's Actually Happening in Enterprise Legal Departments Right Now

The most striking data point from FutureLaw 2026 came from Damien Riehl, who reported that an unnamed Fortune 20 company has built 150 internal workflows it used to outsource to law firms. The CEO instructed employees: "Do not talk to an in-house lawyer unless absolutely necessary. If necessary, talk to an in-house lawyer, but almost under no circumstances do you go to an external lawyer".

Riehl framed this not as a future scenario but as present reality. "This is not the future, this is today. The future is here. It's just not evenly distributed," he said. The implication is clear: organizations that build robust AI agent infrastructure now will gain a structural advantage over those waiting for the technology to mature.

But that advantage only materializes if the infrastructure is sound. A trust score of 6.6 out of 10 among legal operations teams suggests that most organizations are still in the early stages of building that foundation. The organizations moving fastest are those that treat AI agents as infrastructure actors requiring dedicated governance, not as features to be bolted onto existing systems.