Logo
FrontierNews.ai

Open-Source Robots Just Got 37 Times Faster: Why This Changes Everything

Robots that can reliably fold towels, load dishwashers, or prep lab samples have remained out of reach for most AI systems, but a major breakthrough from the Allen Institute is changing that landscape. The team just released MolmoAct 2, an open-source robot foundation model that not only outperforms proprietary competitors on industry benchmarks but also runs dramatically faster and handles real-world tasks without requiring custom training for each new job.

What Makes MolmoAct 2 Different From Other Robot AI Models?

MolmoAct 2 represents a fundamental rethinking of how robots should perceive and act in the physical world. Unlike many robotics models that treat vision and action as separate problems, MolmoAct 2 builds on a specialized vision-language model called Molmo 2-ER, which was trained on approximately 3 million examples of embodied reasoning tasks. This includes image-based pointing, object detection, spatial reasoning, and video-based spatial understanding.

The model's reasoning capabilities are impressive by any standard. Across 13 different embodied-reasoning benchmarks covering tasks like pointing, multi-image reasoning, and video spatial reasoning, Molmo 2-ER scored an average of 63.8 out of 100, outperforming systems including GPT-4V, Gemini 2.5 Pro, and other leading vision-language models.

What sets MolmoAct 2 apart is its speed. A single action call takes approximately 180 milliseconds in the base model and 790 milliseconds with advanced 3D reasoning capabilities, compared to 6,700 milliseconds for the original MolmoAct. That 37-fold speedup means robots can respond to their environment in near-real time rather than pausing visibly between movements.

How Does MolmoAct 2 Actually Work in Real-World Settings?

The Allen Institute tested MolmoAct 2 across simulation environments, zero-shot real-world deployment, and post-training adaptation scenarios. In real-world tests on a Franka robot arm, the model achieved remarkable success rates across diverse manipulation tasks. On straightforward pick-and-place work like moving an apple onto a plate, it reached 100% success. More complex tasks showed similarly strong performance: 86.7% success on placing a pipette into a tray, 93.3% on positioning a small red cube into the center of a tape roll, and 93.3% on inserting a knife into a box.

Across 15 trials per task, MolmoAct 2 averaged 87.1% overall success on these real-world manipulation tasks, compared with 48.4% for the original MolmoBot model and 45.2% for Physical Intelligence's π0.5 system.

What New Capabilities Come With the Release?

The Allen Institute is releasing far more than just the model weights. The package includes several components designed to help researchers build on this work:

  • MolmoAct 2-Bimanual YAM Dataset: Over 720 hours of robot demonstrations involving two-armed coordination tasks such as folding towels, scanning groceries, charging smartphones, and clearing tables, making it the largest open-source bimanual robotics dataset ever released.
  • Adaptive Depth Reasoning: A mechanism that routes 3D depth perception only when it's expected to improve task performance, achieving a 17% speedup compared to full depth-token prediction while maintaining reasoning quality.
  • Open Action Tokenizer: A fully open-source reimplementation of the FAST tokenizer trained on the team's data, addressing a gap where proprietary tokenizers had previously limited reproducibility in the field.
  • Improved Language Annotations: The team re-annotated robot demonstrations using an open vision-language model, increasing unique task labels from approximately 71,000 to approximately 146,000 across the dataset mixture.

MolmoAct 2 also includes bimanual manipulation capabilities built directly into the base model, meaning users get two-armed coordination out of the box without needing to fine-tune the system for each new robot configuration.

How to Leverage Open-Source Robot Models for Your Research or Development

  • Access the Complete Toolkit: Download MolmoAct 2 model weights, the 720-hour bimanual dataset, and the adaptive reasoning approach from the Allen Institute's public repository to study and build on the work without licensing restrictions.
  • Test Zero-Shot Deployment: Use the pre-trained model directly on your robot hardware without per-task fine-tuning to see if it handles your manipulation tasks, then evaluate performance before investing in custom training.
  • Combine With Your Own Data: Supplement the released dataset with your specific robot configurations, camera setups, and task styles to adapt MolmoAct 2 to specialized applications in manufacturing, healthcare, or scientific research.
  • Study the Reasoning Architecture: Examine how Molmo 2-ER's embodied reasoning backbone enables 3D spatial understanding, and apply similar principles to other robotics challenges beyond tabletop manipulation.

Why Does Open-Source Matter for Robotics?

The robotics field has historically relied on closed, proprietary models that limit reproducibility and slow progress. Most teams that develop capable robot systems either keep their weights private, release models without training data, or publish neither. This fragmentation makes it difficult for researchers to understand what works, why it works, or how to improve on existing approaches.

By releasing MolmoAct 2 alongside 720 hours of training demonstrations, the Allen Institute is breaking that pattern. The team trained the model on a diverse mixture of robot datasets including low-cost open-source robot arms, real-world single-arm manipulation data, instruction-conditioned manipulation examples, and household and tabletop tasks. This breadth of training data helps the model generalize across different robot hardware, camera setups, and control schemes.

The practical impact is significant. Robotics labs can now study how a state-of-the-art model reasons about 3D space, understand why it succeeds or fails on specific tasks, and meaningfully improve on the work rather than starting from scratch. For organizations deploying robots in warehouses, laboratories, or service environments, having access to a capable open-source foundation model reduces development time and cost compared to training from scratch or licensing proprietary systems.

MolmoAct 2 represents a shift in how the AI and robotics communities approach foundation models. Rather than treating capable systems as black boxes locked behind commercial agreements, the Allen Institute is demonstrating that transparency and openness can coexist with strong performance, potentially accelerating the timeline for robots that reliably handle the repetitive physical work that remains difficult to staff and the specialized scientific tasks that could benefit from automation.