Logo
FrontierNews.ai

AI Just Learned to Run Real Lab Experiments. Here's Why That Changes Everything

For the first time, artificial intelligence has crossed a critical threshold: it can now design biological experiments, write the code to run them, and execute those experiments on real laboratory equipment, all without human intervention. MGI Tech and the Shanghai Artificial Intelligence Laboratory announced two breakthrough innovations on July 3, 2026, that represent a fundamental shift in how AI interacts with the physical world. ProtoPilot, a self-evolving multi-agent system, and BioLab Bench, an evaluation framework, together establish what researchers are calling "Physical AI for life sciences".

This is not theoretical. When ProtoPilot encountered a failed antibiotic resistance screening step during a protein complex assembly experiment, it diagnosed the problem and autonomously regenerated a corrected protocol without human guidance. The system works by translating experimental intent into physically executable, verifiable, and reproducible actions on automated laboratory platforms.

How Does ProtoPilot Actually Work?

ProtoPilot operates as a full-chain agent system that covers the entire experimental lifecycle. The system moves through four distinct stages, each building on the previous one:

  • Design Phase: The system interprets the experimental goal and generates a detailed protocol for the biological task at hand.
  • Code Translation: The protocol is converted into executable code that can run on automated laboratory equipment.
  • Device Execution: The code runs on real automation platforms, performing the actual wet-lab work.
  • Feedback Loop: Results from the physical experiment feed back into the system, allowing it to learn from both successes and failures.

What makes this different from previous AI systems is the learning mechanism. ProtoPilot doesn't just generate answers and move on. It learns from failure. When something goes wrong in the lab, the system diagnoses the issue and regenerates a corrected approach, creating a continuous improvement cycle.

How Close Is AI to Expert-Level Lab Work?

The performance metrics reveal just how close AI has come to matching human expertise. Researchers tested ProtoPilot on ProtocolQA, one of the most representative public benchmarks for evaluating AI experimental reasoning capabilities. The results were striking: GPT-5.6-sol, a leading language model, scored 43.5% on the benchmark. Human experts achieved 54%. ProtoPilot achieved 52.38%, approaching expert-level performance.

This narrow gap between AI and human performance is significant because it suggests the technology is ready for real-world deployment in laboratory settings. However, the real innovation isn't just about matching human performance on a benchmark. It's about creating a system that can actually execute experiments on different types of laboratory equipment.

What Makes BioLab Bench Different From Other AI Benchmarks?

BioLab Bench sets a new industry standard for evaluating AI agents in life sciences. Unlike traditional benchmarks that measure whether an AI system generates plausible answers, BioLab Bench evaluates whether an agent can actually execute tasks on real automation equipment. The framework assesses AI performance across multiple dimensions:

  • Real-World Task Coverage: The benchmark spans from fundamental laboratory operations to complex multi-step workflows, stratified across three difficulty levels to test AI performance at different complexity tiers.
  • Full-Chain Assessment: Rather than checking only whether an agent generates a plausible protocol, BioLab Bench evaluates each step, including intent interpretation, protocol design, device-agnostic standard operating procedure generation, device-specific SOP translation, machine code production, and successful execution verification.
  • Cross-Device Transferability: The benchmark can be deployed on different automated laboratory platforms to test whether an AI agent can comprehend experimental tasks and generate executable actions adapted to varying hardware configurations, assessing cross-device generalization capability.

This comprehensive approach addresses a critical gap in AI evaluation. Previous benchmarks measured whether AI could write good protocols. BioLab Bench measures whether those protocols actually work when executed on real machines.

"It reflects a different path from the pure compute race. While leading AI companies rely on scale compute to push the capabilities of general-purpose models, we take a different approach. Through agent scaling and closed-loop data engineering, we organize real-world tasks, device constraints, expert feedback, and wet-lab results into a training ground where AI continuously evolves," noted Dr. Yang Meng, CEO of Genoria AI.

Dr. Yang Meng, CEO of Genoria AI

What Does This Mean for the Future of Biological Research?

The implications extend far beyond academic benchmarks. MGI Tech envisions a future where bioagents no longer improve solely through text-based training. Instead, through the Physical AI experimental loop, they will continuously accumulate real research tasks, automation operations, expert validations, failure cases, and wet-lab feedback. This massive corpus of physical experimental data will enable bioagents to develop integrated reasoning, execution, and validation capabilities, ultimately powering 24/7 unattended intelligent laboratories.

This represents a fundamental shift in how AI research is conducted in life sciences. Rather than training AI systems on text and hoping they transfer to real-world problems, researchers are now building AI systems that learn directly from physical experiments. The feedback loop between digital intelligence and physical execution creates a training ground that mirrors how human scientists actually learn.

MGI Tech's exploration of AI in life sciences dates back to 2019. In 2025, the team published research in Nature Biomedical Engineering introducing "PrimeGen," a dry-wet collaborative multi-agent system that integrated primer design, experimental validation, and automated workstation execution into a closed-loop workflow. The new Physical AI initiative builds on this foundation, leveraging MGI's unique hardware-native advantages and real-world deployment expertise gained from over 3,800 users globally.

The research behind ProtoPilot and BioLab Bench was published as a preprint on arXiv in June 2026, making the findings available to the broader research community. This openness reflects a broader trend in AI research toward transparency and reproducibility, even as the technology becomes more sophisticated and capable.