Logo
FrontierNews.ai

A New Protein-Folding AI Just Predicted 1.1 Billion Structures. Here's Why That Matters Beyond AlphaFold

A new artificial intelligence model called ESMFold2 has just predicted the structures of 1.1 billion proteins, far exceeding what AlphaFold has accomplished. Released by the Chan Zuckerberg Biohub (CZ Biohub), the model and its accompanying database, called ESM Atlas, represent a significant expansion of humanity's understanding of the protein universe. Unlike AlphaFold's predictions, ESMFold2 includes data from metagenomes, the total genomes of microorganisms living in soil, oceans, and other environments.

What Makes ESMFold2 Different From AlphaFold?

ESMFold2 is built on a "protein language model," a type of artificial intelligence that learns amino acid sequences the way a language model learns words. The researchers trained this model on billions of protein sequences obtained from living organisms, including metagenomic data that had never been included in AlphaFold's training set. This approach allowed ESMFold2 to predict structures for proteins that exist in nature but had never been formally studied.

The scale of the difference is striking. ESMFold2's predictions exceed those in the AlphaFold database by 800 million structures. Beyond raw numbers, ESMFold2 also shows superior performance in specific tasks. According to CZ Biohub, the model predicts protein complex structures, such as how antibody molecules bind to their target antigens, more accurately than existing models, including AlphaFold3, Google DeepMind's latest version.

To validate their predictions, the research team took an unusual step. They used ESMFold2 to design new antibodies and proteins that bind strongly to proteins involved in cancer and immune diseases, then produced these molecules in the laboratory and confirmed they functioned exactly as the model predicted.

How Can Researchers Access and Use ESM Atlas?

  • Open Access: ESM Atlas is freely available to all researchers worldwide, with no paywalls or licensing restrictions limiting who can use the data.
  • Massive Scale: The atlas contains 1.1 billion predicted protein structures and 6.8 billion protein sequences, providing an unprecedented resource for biological discovery.
  • Transparency: ESMFold2 is open-source, meaning researchers can examine the model's code and methodology, fostering trust and enabling improvements by the scientific community.

The decision to make ESM Atlas freely available has already generated enthusiasm among the scientific community. Researchers have begun using the database to uncover hidden connections between proteins across different organisms.

What New Discoveries Has ESMFold2 Already Revealed?

Using ESM Atlas, the CZ Biohub team has already made discoveries that challenge existing biological understanding. They found structural similarity between CRISPR microbial defense proteins, which bacteria use to fight viral attacks, and gene-editing proteins found in soil fungi. This finding is significant because CRISPR proteins had previously been thought to exist only in prokaryotes such as bacteria. The discovery of similar proteins in eukaryotes such as fungi opens new avenues for understanding how gene-editing systems evolved and function.

"ESM Atlas reveals even the most unknown regions of the protein universe. It will provide a powerful foundation for new biological discoveries," said Alex Reeves, chief scientific officer at CZ Biohub and lead author of the ESM Atlas study.

Alex Reeves, Chief Scientific Officer at CZ Biohub

Reeves emphasized that the team has made the atlas freely available to help researchers connect the "known" and "unknown" regions of the protein world, suggesting that many more discoveries await.

Are Scientists Convinced ESMFold2 Is Better Than AlphaFold?

The scientific response has been largely positive but not unanimous. Several leading computational biologists have praised the work. Gemma Atkinson, a professor of computational biology at Lund University in Sweden, stated that "the ESMFold2 database will become an enormous resource for biology," and noted that "large-scale protein language models can capture the fundamental biological rules of proteins".

Christine Orengo, a professor in the Department of Structural and Molecular Biology at University College London, added that "ESMFold2 predictions can help discover new protein folds and functions" and "will significantly influence protein design and our basic understanding of biology".

However, some researchers have expressed caution. Martin Steinegger, a professor at the School of Biological Sciences at Seoul National University, questioned how well ESMFold2 performs on proteins that are very different from previously known structures. He noted that earlier versions of ESMFold struggled with predicting unique protein structures from metagenomic data.

"I am not sure how well ESMFold2 can predict structures for proteins that are very different from previously known proteins," noted Martin Steinegger, a professor at Seoul National University.

Martin Steinegger, Professor at Seoul National University

Sergey Ovchinnikov, a professor in the Department of Biology at the Massachusetts Institute of Technology, took a middle position, evaluating ESMFold2 not as a replacement for the AlphaFold database but as a complementary resource that fills different needs.

What Does This Mean for Drug Discovery and Biotech?

The expansion of available protein structures has immediate practical implications for pharmaceutical research. Understanding how proteins fold and interact is fundamental to drug discovery. With 1.1 billion predicted structures now available, researchers have vastly more targets to explore and more information about how potential drugs might interact with disease-causing proteins. The fact that ESMFold2 excels at predicting protein complexes, particularly antibody-antigen interactions, makes it especially valuable for immunotherapy and cancer research.

The open-source nature of ESMFold2 also democratizes access to this technology. Smaller biotech companies and academic labs that cannot afford proprietary tools now have access to cutting-edge protein prediction capabilities. This could accelerate innovation in regions and institutions that previously lacked resources for advanced computational biology.

The emergence of ESMFold2 also reflects a broader trend in AI-driven science. Researchers and investors are increasingly betting that artificial intelligence can accelerate scientific discovery across multiple fields. Periodic Labs, a startup founded by former OpenAI and Google DeepMind researchers, is raising $500 million to build autonomous robotic laboratories that run thousands of physics and chemistry experiments to discover new materials. This vision of AI-powered scientific discovery extends well beyond protein folding, suggesting that tools like ESMFold2 may be just the beginning of a larger transformation in how science is conducted.