Hermes Agent's New Tool Search Feature Cuts Context Bloat by 85%, Boosts Accuracy to 88%

FrontierNews.ai AI Research Desk

Hermes Agent's New Tool Search Feature Cuts Context Bloat by 85%, Boosts Accuracy to 88%

Nous Research has released a major update to its open-source Hermes Agent that solves a critical problem plaguing AI agent deployments: too many tool definitions consuming precious context space. The new Tool Search feature uses intelligent retrieval to load only the tool schemas an agent actually needs, rather than dumping every connected tool's full JSON definition into the model's context window on every turn. The result is dramatic, according to Anthropic's internal evaluations: accuracy improvements ranging from 49% to 74% on Claude Opus 4, and 79.5% to 88.1% on Claude Opus 4.5.

Why Are Tool Definitions Eating Up So Much Context?

When an AI agent connects to multiple MCP (Model Context Protocol) servers, every tool's JSON schema gets loaded into the model's visible context on every single turn, even if the agent only needs one or two tools for a given task. A real-world Hermes deployment with five MCP servers and 34 tools shows average prompt sizes of 45,000 tokens per turn, with roughly 22,000 tokens (about 50%) consumed by tool schema overhead alone. Anthropic's own engineering data shows tool definitions can consume 134,000 tokens before any optimization is applied.

This creates two immediate problems. First, there's a cost problem: cache-miss generations at session start can cost $0.07 to $0.10 per turn in API spend. Second, there's an accuracy problem. When a model sees hundreds of irrelevant tool options simultaneously, it experiences what researchers call "decision paralysis," leading to false positives and incorrect tool selections. A research paper measuring the "MCP Tools Tax" found that tool definitions alone consume 15,000 to 60,000 tokens per turn for typical multi-server deployments.

How Does Tool Search Actually Work?

Rather than loading all tool schemas upfront, Tool Search replaces them with three lightweight bridge tools that the model can call on demand. When the agent needs a tool, it follows a three-step retrieval sequence:

Search: The model calls tool_search(query, limit?) to find relevant tools using BM25, a classic information retrieval algorithm that matches against tool names, descriptions, and parameter names.
Describe: Once a match is found, the model calls tool_describe(name) to load the full JSON schema for only that specific tool into context.
Execute: The model then calls tool_call(name, arguments) to invoke the tool, with all hooks, guardrails, and approval prompts running against the real underlying tool name, not the bridge.

The system uses BM25 for matching, with a fallback to literal substring matching if no positive-score hits are found. This prevents edge cases where every tool in a catalog contains the same keyword (like "github"). The tool catalog is stateless across turns, rebuilding from the current tool definitions on every assembly to prevent drift bugs where a stored catalog goes out of sync with the live tool registry.

What Are the Real-World Accuracy Gains?

Anthropic's internal MCP evaluations show significant accuracy improvements when Tool Search is enabled. On Claude Opus 4, accuracy improved from 49% to 74%, a 25-percentage-point gain. On Claude Opus 4.5, accuracy jumped from 79.5% to 88.1%, an 8.6-percentage-point improvement. The accuracy gains stem directly from removing decision paralysis. When irrelevant tool schemas are no longer cluttering the model's context, the model makes better decisions about which tools to use.

Alongside accuracy improvements, Anthropic's data shows an 85% reduction in tool-definition token usage while maintaining full access to the entire tool library. This means agents can connect to dozens of tools without the token cost that previously made such deployments impractical.

How to Configure Tool Search in Hermes

Tool Search is opt-in and runs in auto mode by default, activating only when deferrable tool schemas would consume at least 10% of the active model's context window. Below that threshold, the tools array assembly is a pure pass-through with no overhead. A session with just a few MCP tools and a long-context model may never activate Tool Search, while a session with 15 or more tools typically starts activating it.

Developers can control Tool Search behavior by adding configuration to their hermes.yaml file:

enabled: Set to "auto" (default, activates above threshold), "on" (always activates if there's at least one deferrable tool), or "off" (disables entirely).
threshold_pct: The percentage of context length at which auto mode kicks in, ranging from 0 to 100 (default is 10).
search_default_limit: The number of hits returned when the model calls tool_search without specifying a limit (default is 5).
max_search_limit: A hard upper bound on the number of results the model can request via the limit parameter, ranging from 1 to 50 (default is 20).

Developers can also use a simple boolean shorthand: setting "tools: tool_search: true" is equivalent to enabling auto mode.

What Does This Mean for Hermes Deployments?

The Tool Search feature addresses a fundamental scaling problem in agent systems. As enterprises connect more MCP servers and plugins to their agents, context window consumption becomes a bottleneck that limits both accuracy and cost-efficiency. By deferring tool schema loading until needed, Hermes Agent enables deployments with dozens of connected tools without the token overhead that previously made such configurations impractical.

Hermes Agent, built by Nous Research, has become a significant player in the open-source agent ecosystem. The framework crossed 140,000 GitHub stars in under three months and processed 224 billion tokens in a single day on OpenRouter, according to usage data from May 2026. The platform's appeal lies in its focus on persistence and self-improvement. Unlike stateless chatbots, Hermes lives on a user's machine, remembers everything across sessions, and improves over time through a closed learning loop.

The Tool Search release demonstrates how the open-source agent community is solving real deployment challenges. Rather than waiting for proprietary solutions, developers at Nous Research identified a concrete problem, built a solution grounded in information retrieval theory, and validated it against Anthropic's evaluation benchmarks. For teams deploying Hermes in production environments with multiple tool integrations, Tool Search offers a straightforward way to improve both accuracy and cost efficiency without architectural changes.

" }

Your AI & Tech News Engine

Breaking News

How AI Search Engines Actually Decide Who to Recommend: The New SaaS Playbook

Grok 4.5 Pricing Puzzle: Why the Same Model Costs Different Amounts on Different Platforms

Claude's Values Shift Depending on the Language You're Using, Anthropic Researchers Find

Why Chinese AI Models Are Now Powering Ransomware Attacks

Apple's Wrist-Worn AI Revolution: How watchOS 27 Brings Siri Intelligence to Your Wrist

Grok 4.5 Joins AI Price War as Enterprise Bills Soar Past $1 Million

OpenAI Staff Publicly Defend Sam Altman's Leadership as He Welcomes Internal Dissent

Why Grok Is Suddenly Everywhere: The AI Chatbot Facing Regulatory Scrutiny and Rapid Feature Expansion

Hermes Agent's New Tool Search Feature Cuts Context Bloat by 85%, Boosts Accuracy to 88%

Why Are Tool Definitions Eating Up So Much Context?

How Does Tool Search Actually Work?

What Are the Real-World Accuracy Gains?

How to Configure Tool Search in Hermes

What Does This Mean for Hermes Deployments?