Logo
FrontierNews.ai

Open-Source Robot Models Are Finally Catching Up to Proprietary Systems

Open-source robotics models are now competitive with proprietary systems built by well-funded tech companies, marking a significant shift in how robot intelligence gets developed and deployed. Allen AI released MolmoAct 2, an open foundation model for robots that outperforms closed commercial alternatives on industry benchmarks, runs dramatically faster, and comes with full transparency about how it works. The release includes the largest open-source bimanual robotics dataset ever published, containing over 720 hours of robot training demonstrations.

Why Does Open-Source Robot Intelligence Matter?

For years, the most capable robotics models have been locked behind proprietary walls. Companies like Physical Intelligence and others release impressive demos, but researchers can't study the underlying code, data, or reasoning processes. This opacity slows down the entire field. MolmoAct 2 changes that equation by proving that an open, transparent approach can match or exceed closed systems while enabling the broader research community to understand and improve the work.

The practical implications are significant. When robot models are open, researchers at universities, smaller companies, and independent labs can build on them without waiting for permission or paying licensing fees. They can identify weaknesses, propose improvements, and adapt the models to new tasks. This accelerates innovation in ways that proprietary systems, by design, cannot.

How Does MolmoAct 2 Actually Perform?

MolmoAct 2 was tested across three categories of real-world robot tasks: simulation environments, zero-shot deployment on actual robots, and adaptation to new settings. The results demonstrate substantial improvements over its predecessor and competing systems.

  • Simulation Performance: On MolmoBot, a household manipulation benchmark designed to be difficult, MolmoAct 2 achieved a 20.6% success rate across all tasks, roughly double the 10.3% score of Physical Intelligence's π0.5 model.
  • Real-World Zero-Shot Tasks: When deployed on a Franka robot arm without any task-specific training, MolmoAct 2 reached 87.1% average success across 15 trials per task, compared with 48.4% for its predecessor and 45.2% for π0.5.
  • Specific Task Success Rates: The model achieved 100% success on simple pick-and-place tasks like moving an apple onto a plate, 86.7% on placing a pipette into a tray, 93.3% on inserting a small cube into a tape roll, and 62% on longer-horizon tasks involving multiple objects.

Speed matters as much as accuracy in real-world robotics. A robot that pauses visibly between movements feels sluggish and unreliable. MolmoAct 2 responds in approximately 180 milliseconds for basic actions and 790 milliseconds with advanced 3D reasoning, compared with 6,700 milliseconds for the original MolmoAct. That's roughly 37 times faster, the difference between a robot that hesitates and one that moves fluidly.

What Makes the Architecture Different?

MolmoAct 2 isn't simply an incremental update. The team rebuilt the architecture from the ground up, starting with a specialized version of their Molmo 2 language model trained on approximately 3 million examples of embodied reasoning tasks. These examples covered image-based pointing, object detection, spatial reasoning across multiple images, and video-based spatial understanding.

The model pairs this reasoning backbone with a dedicated action expert that generates robot movements through a technique called flow matching, connected via a KV-cache bridge. Importantly, the team also created an open-source reimplementation of the FAST tokenizer, a tool that converts visual information into a format robots can act on. Physical Intelligence developed the original FAST tokenizer, but hadn't released the training data. By publishing their own version, Allen AI ensured the entire pipeline remains transparent and reproducible.

One innovation called adaptive-depth reasoning allows the model to focus computational effort where it matters most. Instead of analyzing every pixel for 3D depth information, the system predicts depth only in regions where the scene is changing dynamically. This selective approach achieves a 17% speedup compared to full depth analysis while maintaining performance on tasks that benefit from explicit 3D understanding.

What Data Powers This Model?

MolmoAct 2 was trained on the MolmoAct 2-Bimanual YAM dataset, a collection of over 700 hours of robot demonstrations involving two coordinated arms working together. This is the largest open-source bimanual robotics dataset ever released, containing more than 30 times the robot data used for the original MolmoAct. The dataset covers practical tasks like folding towels, scanning groceries, charging smartphones, and clearing tables.

The team supplemented this with a diverse mix of other open robotics datasets, exposing the model to different robot arms, camera configurations, control schemes, and task styles. This includes data from low-cost open-source robot arms, real-world manipulation tasks with Franka robots, instruction-conditioned manipulation from Google and other sources, and data from the WidowX robot platform. The variety ensures the model generalizes beyond any single robot setup.

Data quality matters as much as quantity. Many robotics datasets contain repetitive task labels or low-quality annotations like test-run strings. Allen AI re-annotated robot demonstrations using an open vision-language model, increasing the number of unique task labels from approximately 71,000 to 146,000. Better annotations help the model learn more diverse behaviors and understand task variations.

How Does This Fit Into the Broader Robotics Landscape?

The release of MolmoAct 2 arrives at a moment when the robotics field is consolidating around foundation models, similar to how large language models transformed natural language processing. Major tech companies are acquiring robotics research teams, building proprietary hardware, and investing heavily in embodied AI. Meta's acquisition of Assured Robot Intelligence in May 2026 exemplifies this trend, with the company folding the team into its Superintelligence Labs to develop the intelligence layer for humanoid robots.

Meanwhile, practical deployments are accelerating. Locus Robotics' Array system is now live in production at DHL, handling pick, putaway, induction, and replenishment tasks with a claimed 90% reduction in manual labor. In China, robots are stepping into everyday roles, from cleaning homes to directing traffic, as the country accelerates its embrace of embodied AI. A cleaning service launched in March pairs human cleaners with wheeled robots, with the robot handling approximately 30% of the workload on repetitive tasks like wiping tables and cleaning floors.

Open-source models like MolmoAct 2 create a counterbalance to this consolidation. By publishing model weights, datasets, and architectural innovations, Allen AI enables researchers without access to massive corporate resources to contribute to robotics progress. This democratization of robot intelligence could accelerate innovation in unexpected directions and ensure that robotics development isn't controlled exclusively by a handful of well-funded companies.

Steps to Understand and Use MolmoAct 2

  • Access the Model: MolmoAct 2 model weights, the MolmoAct 2-Bimanual YAM dataset, and the updated VLA (Vision-Language-Action) pipeline are available for researchers to download and study, including the adaptive reasoning approach that improves 3D spatial understanding.
  • Review the Benchmarks: Researchers can evaluate the model on industry-standard benchmarks like MolmoBot and RoboEval to understand its capabilities and limitations before deploying it on their own robot platforms.
  • Adapt for New Tasks: The model handles various real-world tasks out of the box without per-task fine-tuning, but researchers can also post-train it on single-arm and bimanual tasks specific to their applications, as demonstrated in Allen AI's evaluation.

The release of MolmoAct 2 signals that open-source robotics models are no longer experimental proof-of-concepts. They're competitive, deployable systems that can handle real-world tasks reliably. For the robotics field, this means the foundation-model era isn't exclusively the domain of Big Tech. It's becoming a shared endeavor where transparency, reproducibility, and community contribution drive progress forward.