AI Just Discovered How to Think More Efficiently Than Humans Designed
An AI agent has discovered more efficient reasoning strategies than human engineers designed, reducing the computational cost of advanced AI reasoning by 70% while achieving better accuracy on math problems. Researchers from the University of Maryland, Google, and Meta let Claude Code loose in a controlled environment where it autonomously discovered test-time scaling algorithms that outperform established human-designed methods. The entire discovery process cost $39.90 in compute resources and took 160 minutes.
What Is Test-Time Scaling and Why Does It Matter?
Test-time scaling is a technique that improves large language model (LLM) performance by allocating more computational resources during response generation. Instead of rushing to an answer, a model might run multiple solution paths in parallel or extend chains of thought before settling on a final response. Think of it as giving an AI system more time to think through a difficult problem, similar to how a student might work through several approaches to a math problem before choosing the best one.
The practical implications are significant. Inference costs, the expense of running AI models to generate responses, remain a major barrier to deploying advanced reasoning techniques in production systems. If AI agents can discover more efficient scaling algorithms, that directly translates to lower API bills and broader accessibility for companies and developers using these systems.
How Did Researchers Flip the Script on Algorithm Design?
The project, called AutoTTS, represents a fundamental shift in how AI researchers approach optimization. Traditionally, human engineers manually designed the rules controlling when a model should branch into new solution paths, double down on promising ones, or abandon dead ends. The AutoTTS team asked a different question: why not let an AI agent discover these rules automatically ?
The researchers identified that many known test-time scaling methods are really just special cases within a shared control space defined by two variables: width, which controls how many solution paths run simultaneously, and depth, which determines how far each path extends. Rather than humans plotting paths through this space by hand, they built an environment where Claude Code could search it autonomously.
The clever engineering kept costs manageable. Instead of repeatedly calling the language model during the search process, researchers pre-generated several solution paths for each task and stored them offline. A new control algorithm then decides how to spend compute based on data already available. This approach allowed thousands of algorithm variants to run without firing up the actual language model each time, dramatically reducing the cost of the discovery process.
How to Understand the Discovery Process
- Agent Iteration: Claude Code reviewed results from previous attempts, identified weaknesses in earlier proposals, and wrote new control algorithms directly in code across several rounds of refinement.
- Constraint Design: To prevent the search from getting lost in thousands of minor adjustments, each proposal could only expose one high-level controller to the outside world, which then set all other thresholds automatically.
- Performance Feedback: Full logs from each run showed the agent exactly where earlier attempts wasted compute, enabling continuous improvement toward more efficient solutions.
What Results Did the Discovered Algorithm Achieve?
On math benchmarks like AIME and HMMT, the algorithm Claude Code discovered achieves better accuracy per unit of compute than established methods. The efficiency gains are substantial: the lean setting slashes token usage by approximately 70% compared to standard self-consistency, which typically generates 64 answers in parallel and selects the most common one.
These aren't marginal improvements. The discovered algorithm performs better on challenging math problems while using a fraction of the computational resources. For context, standard self-consistency requires generating dozens of parallel responses and comparing them, which is computationally expensive. The AI-discovered approach achieves superior results with far fewer redundant computations.
What Does This Shift Mean for AI Research?
The AutoTTS paper points to a broader methodological shift in AI research. The bottleneck to progress may no longer be human architectural ingenuity in designing algorithms, but rather our ability to build effective discovery environments where AI agents can innovate autonomously. Researchers are moving from a paradigm where humans design strategies to one where humans design the environments that foster algorithmic innovation.
This meta-level change involves humans defining states, actions, and feedback mechanisms, then stepping back to let the search process unfold. The research team included collaborators from the University of Maryland, University of Virginia, Washington University in St. Louis, University of North Carolina, Google, and Meta, reflecting broad institutional interest in this approach.
Discussion in AI research communities has focused on the meta-nature of the achievement. Some researchers expressed skepticism about whether the findings generalize beyond math benchmarks, while others view it as the beginning of self-optimizing AI systems that could accelerate their own development. The 70% compute reduction matters immediately for practical deployment, making advanced reasoning techniques more accessible and cost-effective for real-world applications.