From ChatGPT to Robot Brains: Why AI's Next Frontier Is Teaching Machines to Understand Physics
The next frontier in artificial intelligence isn't about predicting the next word in a sentence; it's about teaching machines to understand how the physical world actually works. A growing number of AI entrepreneurs and leading scientists, including "Godmother of AI" Fei-Fei Li and Yann LeCun, are shifting focus from language models that power chatbots like ChatGPT to "world models" that could unlock a new generation of intelligent robots and embodied AI systems.
What Are World Models and Why Do They Matter?
World models represent a fundamental shift in how researchers think about artificial intelligence. While language models learn patterns in text by predicting the next word, world models learn the statistical structure of space and time. This means understanding how light falls on a surface, how objects respond to physical force, and how a garden looks from an angle no camera has ever captured.
The distinction matters because current AI chatbots have a critical limitation: they cannot pick up a coffee mug. As Martial Hebert, dean of computer science at Carnegie Mellon University, explained, there is far more complexity involved in physical interaction than in predicting text.
"There's all the geometry of the world, the dynamic of how I move my hand, the physical interaction of the contact with the cup. This is much more complex than just predicting the next word in a sentence," said Martial Hebert.
Martial Hebert, Dean of Computer Science, Carnegie Mellon University
Why Are Top Researchers Leaving Language Models Behind?
Louis Castricato spent eight years studying large language models (LLMs), which are AI systems trained on vast amounts of text to predict and generate human language, at Brown University before concluding he had hit a dead end. "We basically have passed the point of doing real fundamental LLM research," Castricato said. "Now it's just applications." He left academia to start Overworld, a Rhode Island-based startup building AI systems that can understand and navigate physical environments rather than just process text.
This exodus from language model research reflects a broader recognition among scientists that while chatbots have transformed office work and creative fields, they are fundamentally limited when it comes to physical tasks. Hebert, who has spent more than four decades researching robotics, sees world models as the faster and cheaper path to what the industry calls "physical AI" or "embodied AI," which represents the evolution of traditional robotics.
How Are World Models Being Developed and Deployed?
Researchers are pursuing multiple approaches to world models, each optimized for different applications. Fei-Fei Li, founder of the San Francisco startup World Labs, has proposed a taxonomy dividing world models into three distinct categories:
- Renderers: These prioritize visual fidelity and create gorgeous virtual worlds, but they cannot reliably teach robots about physics because they may produce physically impossible scenarios like flames that defy the laws of combustion.
- Simulators: These create virtual training grounds that faithfully represent the physical structure of the world, allowing robots to learn realistic physics before operating in real environments.
- Planners: These predict what an AI agent or robot should actually do in an unstructured, real-world environment, representing the most commercially valuable approach for practical robotics applications.
"Where language models learn the statistical structure of text, world models learn the statistical structure of space and time: how light falls on a surface, how a garden looks from an angle no camera has captured, how objects respond to force and follow the laws of physics," wrote Fei-Fei Li.
Fei-Fei Li, Founder, World Labs
Overworld is taking a different approach, optimizing for interaction above all else. The startup is building video game worlds where scenes can adapt as a virtual character moves through them and interacts with objects. "There's no other world model where you can just walk through doors or where you can interact with a detailed environment like this," Castricato explained.
What's Driving Investment in This Space?
Despite the near-term applications being less obvious than AI coding tools, venture capitalists are pouring money into world model companies. Steve Jang, co-founder and managing partner at Kindred Ventures, is investing in Overworld and other world model-focused startups including Causal Labs, which builds AI models for weather prediction, and Extropic, which designs specialized computer chips suited to world models.
"I think that the future is many different types of models with many different philosophies and architectures. I don't think that it'll be one large, dense model to rule them all," said Steve Jang.
Steve Jang, Co-founder and Managing Partner, Kindred Ventures
This investment enthusiasm reflects a broader recognition that world models could unlock the next wave of AI capability. While investors continue to commit trillions of dollars to language model developers like Anthropic and OpenAI, a growing number of entrepreneurs believe the real breakthrough will come from teaching AI systems to understand and navigate physical reality.
How Is the Physical AI Market Accelerating Globally?
The shift toward embodied AI is happening at a global scale, with China emerging as a particularly aggressive player. Morgan Stanley has sharply raised its outlook for China's humanoid robotics market, expecting 50,000 units to ship in 2026, nearly double its previous projection of 28,000 units. The bank estimates China's humanoid robot market will reach $2 billion this year and grow to $15 billion by 2030, with annual shipments forecast to reach 446,000 units by then.
This acceleration reflects a shift from demonstration to commercial deployment happening faster than expected. Chinese manufacturers are racing to scale production and deploy robots in real-world settings such as factories, convenience stores, and restaurants. Beijing has made developing embodied AI a priority for the coming five years, directing local governments to subsidize startups with land and office space while ordering banks to extend favorable lending terms.
Meanwhile, in the United States, the market is beginning to take shape in unexpected ways. KOID SHOP, a pop-up storefront dedicated to general-purpose robots, is set to open in New York City's SoHo neighborhood on June 26, 2026, marking one of the first opportunities for American consumers to see and interact with leading robotic platforms in person. The store will feature robots from manufacturers including Unitree, AgiBOT, Booster, and LimX Dynamics, with capabilities on display including sports and dancing routines, home care, educational tutoring, and home safety monitoring.
"We believe humanoid robotics and physical AI represent one of the most important technology opportunities of the coming decade. Opening a robot store in the heart of New York City is an opportunity to help the public experience firsthand how quickly these technologies are advancing and how they may transform everyday life," said Jonathan Krane.
Jonathan Krane, CEO, KraneShares
How to Understand the Three Types of World Models and Their Applications
Understanding the different approaches to world models helps explain why researchers and investors are pursuing multiple strategies rather than betting on a single technology. Each type serves different purposes in the broader effort to create robots that can work reliably in the real world:
- Visual Fidelity Focus: Renderer models prioritize creating visually stunning virtual environments but sacrifice physical accuracy, making them useful for entertainment and visualization but less suitable for training robots that must obey real-world physics.
- Physics Accuracy Focus: Simulator models prioritize faithful representation of how the physical world actually works, allowing robots to practice tasks in virtual environments before attempting them in real settings where mistakes could be costly.
- Decision-Making Focus: Planner models prioritize predicting what actions a robot should take in unpredictable real-world situations, representing the holy grail of robotics because a robot that can plan effectively is a robot that can actually work.
The convergence of world model research, venture capital investment, and commercial deployment suggests that physical AI is transitioning from a theoretical research area to a practical industry. As Martial Hebert explained, the human body itself demonstrates the power of general physical models: "In your body and spinal cord you have a very general model of how to balance, how to walk around, and you can adapt to your knee hurting in the morning, so you now walk a little differently. You don't need to think about that. You have a general model somewhere in your nervous system and brain that allows your body to adapt very quickly". Building AI systems with similar adaptive capabilities could unlock robots that work reliably in the real world, not just in controlled laboratory settings.