Why AI's Hunger for Thinking Time Is About to Transform the Chip Industry
As artificial intelligence models spend more time thinking through problems, they will generate unprecedented demand for memory semiconductors, according to OpenAI's Vice President of Research. This shift fundamentally changes how the tech industry should evaluate AI progress and has major implications for chip manufacturers, particularly in South Korea.
Why Is Longer AI Reasoning Time Creating a Memory Crisis?
When AI models take more time to reason through a problem, they generate vastly more internal data that needs to be stored temporarily. This is where memory chips become critical. Noam Brown, Vice President of Research at OpenAI, explained at the Global AI Frontier Symposium 2026 in Seoul that extended reasoning times create a steep rise in memory demand.
"South Korean memory semiconductors will become even more important going forward, and the memory bottleneck will persist," Brown stated.
Noam Brown, Vice President of Research at OpenAI
Recent research by Dell engineers confirmed that reasoning length and cache memory increase in lockstep with each other. This means every additional hour an AI model spends reasoning requires proportionally more memory capacity. Brown pointed to a concrete example: OpenAI's model that achieved gold medal-level performance at the International Mathematical Olympiad last year "was the result of the AI model reasoning for several hours".
This directly contradicts the "semiconductor peak-out" theory that has circulated recently, which suggests chip demand will plateau as AI development matures. Brown argued that infrastructure will remain a persistent bottleneck precisely because of this reasoning-time trend.
How Should We Actually Be Measuring AI Progress?
Brown made a provocative argument: the way the industry currently evaluates AI models is fundamentally misleading. Most comparisons focus on a single benchmark score while hiding how many computational tokens each model consumed to reach that score. This creates an incomplete picture of real-world performance.
He used OpenAI's GPT-5.5 model as a case study. Looking only at benchmark scores, GPT-5.5 showed marginal improvement over its predecessor, with terminal benchmark scores rising from 75 percent to 83 percent. Cybersecurity evaluations improved from 79 percent to 82 percent. Many observers dismissed these gains as "nothing special." However, when researchers plotted performance against the number of output tokens consumed, the picture changed dramatically. The older model required far more tokens to reach comparable answers, revealing that GPT-5.5 was substantially more efficient.
"The old model consumed far more tokens to reach an answer, and the real performance difference only becomes apparent when comparing at the same token volume," Brown emphasized.
Noam Brown, Vice President of Research at OpenAI
The implications extend to frontier models that don't show performance plateaus regardless of token budget. In cybersecurity evaluations by the UK's AI Safety Institute, the latest model's performance continued improving even when using up to 100 million tokens. The evaluation only stopped at that point because the institute "had no more time to run it," not because the model had reached its capability ceiling. In contrast, older models like GPT-4o or Anthropic's Claude 3.7 Sonnet showed clear performance stagnation at certain token thresholds.
What Are the Hidden Safety Implications of Test-Time Compute?
Brown raised a critical concern: current AI safety evaluations are mostly conducted with low token budgets, meaning results could differ dramatically if an organization deployed massive inference resources. A safety assessment conducted with a budget of $10 or $100 might conclude a model is "not dangerous," but investing $1 million in inference could reveal far more powerful capabilities that weren't tested.
This creates a gap between how models are evaluated and how they might actually be deployed. Brown argued that governments, companies, and AI labs must fundamentally change how they report AI capabilities and safety assessments.
Steps to Improve AI Evaluation and Transparency
- Disclose Inference Resources: Organizations should report tokens, costs, and time spent during testing rather than presenting benchmarks as single numbers, allowing stakeholders to understand the full context of model capabilities.
- Standardize Budget Caps in Benchmarks: Third-party benchmarks and leaderboards should indicate the token volume used by models or impose explicit budget caps to ensure fair comparisons across different AI systems.
- Update Safety Frameworks: AI companies' preparedness frameworks and responsible scaling policies must explicitly reflect the token volume and inference resources used during testing and evaluation.
Why Are Tech Giants Building Their Own AI Chips?
The shift toward longer reasoning times is driving major technology companies to develop custom AI chips. Anthropic is in discussions with Samsung Electronics on custom AI chip development and has begun initial work on its own chip. OpenAI is collaborating with Broadcom and Taiwan Semiconductor Manufacturing Company (TSMC), targeting deployment of its first inference chip in the second half of 2026.
This move is not primarily about cost reduction. Instead, experts analyze that it represents a strategy to maximize inference speed and energy efficiency through hardware optimized for each company's specific model architectures. It also secures long-term bargaining power and enables optimization for particular workloads. While this represents a move to reduce dependence on Nvidia GPUs, the significance lies more in strategic independence than in immediately displacing Nvidia's dominance.
Brown's remarks about South Korea's semiconductor leadership suggest that even as companies develop custom chips, the underlying demand for memory semiconductors will remain robust. South Korea's Samsung Electronics and SK Hynix are global leaders in memory chip manufacturing, positioning them to benefit from the inference-time scaling trend regardless of which companies are building custom processors.
The broader implication is clear: the AI industry is entering a new phase where thinking time, not just model size, drives progress. This shift will reshape infrastructure investments, evaluation methodologies, and the competitive landscape for semiconductor manufacturers for years to come.