Logo
FrontierNews.ai

Hermes Agent Just Got 60x Faster at Reading the Web. Here's What Changed.

Nous Research has released a major update to Hermes Agent that dramatically speeds up web research while slashing costs. The self-improving agent now reads web pages up to 60 times faster and processes them 49% cheaper by fundamentally redesigning how it handles content extraction and summarization.

What's the bottleneck that Hermes just fixed?

The old pipeline had a critical inefficiency. Web scraping backends would return raw, unprocessed content that then got processed redundantly before reaching the agent. This meant the same information was being handled multiple times, wasting both time and computational resources. The new architecture eliminates that redundancy entirely by having backends pass clean content directly to the agent.

For large pages, Hermes now saves content locally and pages through it on demand, rather than forcing the agent to process everything at once. This simple shift in how data flows through the system produces dramatic improvements without sacrificing quality.

How does Hermes handle different page sizes?

The update introduces a tiered approach to content processing based on page length:

  • Small pages (under 5,000 characters): Returned as-is with no language model processing required, delivering full markdown directly to the agent.
  • Medium pages (5,000 to 500,000 characters): Processed through a single-pass summary using an auxiliary model, capped at roughly 5,000 characters of output while preserving quotes, code blocks, and key facts.
  • Large pages (500,000 to 2,000,000 characters): Chunked into 100,000-character pieces and summarized in parallel, with final synthesis condensed to about 5,000 characters.
  • Extremely large pages (over 2,000,000 characters): Refused with a suggestion to use focused extraction instructions instead.

The summarization system functions as a content compressor rather than a paraphraser. If summarization fails, Hermes falls back gracefully to the first 5,000 characters of raw content without generating error messages.

Where does the cost savings actually come from?

A key insight in the update is that not all processing needs premium computational resources. By default, web extraction used the main language model, which on expensive models like Claude Opus meant every long page burned premium tokens on summarization. The new approach routes summarization tasks to a cheaper auxiliary model, Google's Gemini Flash, while keeping reasoning on the premium model.

This architectural shift alone cuts web research costs significantly. Users can now configure which backend handles different tasks, mixing free and paid services strategically. For example, search-only providers like SearXNG or DuckDuckGo can pair with extraction services like Firecrawl or Tavily, allowing teams to run searches for free and only pay for extraction when needed.

What backend options are available for web research?

Hermes now supports multiple web backends, each with different capabilities and pricing models:

  • Firecrawl: Offers search, extract, and crawl capabilities with 500 free credits per month.
  • SearXNG: Free, self-hosted search across 70+ engines with no API key required and no rate limits.
  • Brave Search: Provides 2,000 free search queries monthly, search-only functionality.
  • DuckDuckGo (DDGS): Free search with no API key needed.
  • Tavily: Offers search, extract, and crawl with 1,000 free searches per month.
  • Exa: Provides search and extract capabilities with 1,000 free searches monthly.
  • xAI (Grok): Search-only with LLM-generated results.

Users can mix and match these services based on their needs. Search-only providers pair effectively with extraction services, creating flexible, cost-optimized workflows. For teams using the Tool Gateway through managed Firecrawl, web search and extraction are included with no separate API key or billing required.

When should you skip summarization entirely?

The update acknowledges that summarization isn't always the right tool. If the language model summary drops important fields like structured data, tables, or specific formatting, users can instead employ browser navigation and snapshot functions. These return the live accessibility tree without auxiliary model rewriting, preserving the original structure and content.

This flexibility recognizes that different research tasks have different requirements. A financial report might need exact table formatting, while a news article benefits from summarization. Hermes now lets users choose the right tool for each situation.

The update represents a shift in how AI agents approach web research, moving away from one-size-fits-all processing toward intelligent, context-aware handling of different content types. By eliminating redundancy, routing tasks to appropriate models, and offering flexible backend options, Hermes demonstrates how architectural thinking can deliver both speed and cost efficiency at scale.