Logo
FrontierNews.ai

The Data Crisis Threatening AI's Promise in Biology: Why the U.S. Is Falling Behind

The United States is losing ground in a critical race: building the biological data infrastructure that artificial intelligence needs to revolutionize medicine, agriculture, and public health. While competitors internationally are constructing coordinated AI-biology ecosystems, America's biological data environment remains fragmented, underfunded, and insecure, according to a new policy report from the Council on Strategic Risks. This gap threatens not only breakthrough discoveries but also national security, as the same AI tools that could accelerate drug development can be misused to engineer bioweapons.

Why Is Biological Data So Critical for AI Right Now?

Artificial intelligence and machine learning are already transforming biology. Tools like AlphaFold can model protein structures and their interactions; BoltzGen can design novel protein binders for drug discovery; and ART allows researchers to predict and optimize synthetic biology experiments on computers before touching a lab. These breakthroughs depend entirely on having large, well-organized datasets that AI systems can learn from. Without them, AI tools produce unreliable results, wasting time and money on dead-end research.

The problem is that biological data is extraordinarily messy. The same biological process can be studied using different model systems, different measurement techniques, and different instruments, producing data in incompatible formats with vastly different structures. Integrating these datasets into a single, AI-ready resource requires enormous curation effort, and the U.S. government has not yet made this a coordinated national priority.

What Are the Real-World Applications at Stake?

The potential benefits of AI-driven biology extend across multiple domains. Researchers are already using these tools to find patterns in biological data that enable personalized medicine, predict emerging environmental threats to support precision agriculture, and accelerate the pace of discovery, testing, and manufacturing. At the Gates Foundation's Grand Challenges Annual Meeting in London, ten Cambridge scholars presented research demonstrating this potential in action.

One scholar, Blessing Abodunrin, is studying DNA replication in malaria parasites using cutting-edge DNA sequencing technology integrated with AI and bioinformatics. Another, Mahya Fazel-Zarandi, is developing CRISPR-engineered fallopian tube organoids to model early ovarian cancer and identify the genomic changes that drive disease. These projects show how AI and genomic tools are already being deployed to tackle some of humanity's most pressing health challenges.

How Can Policymakers and Institutions Build Better Data Infrastructure?

  • Standardize Data Formats: Establish common standards for how biological data is collected, stored, and labeled so that datasets from different sources can be combined and used by AI systems without extensive manual conversion.
  • Invest in Data Curation: Fund dedicated teams to integrate heterogeneous biological datasets into cohesive, validated resources that reflect real biological processes and maintain scientific accuracy.
  • Secure Biological Data: Implement robust cybersecurity and access controls to prevent misuse of sensitive biological information while still allowing legitimate researchers to access it for beneficial research.
  • Create Public-Private Partnerships: Coordinate between government agencies, academic institutions, and commercial entities to generate and share biological data in ways that accelerate innovation while protecting national security.

What Are the Dual-Use Risks That Make This Urgent?

The same AI tools that promise to cure disease also pose biosecurity risks. As commercial entities and nation-states develop AI capabilities, the potential for deliberate or accidental misuse increases exponentially. AI can be used to generate whole-genome sequences, perform genomic editing via CRISPR, infer phenotype from genotype, and enable computational structural design of biological systems. In the wrong hands, these capabilities could accelerate the development of bioweapons or engineered pathogens.

This dual-use concern is not theoretical. The Biological Weapons Convention, ratified in 1975, still lacks a verification protocol, making it difficult to detect or prevent misuse. Some AI companies, including OpenAI, are taking steps to address this risk. OpenAI is establishing the Rosalind Biodefense Program to support defense biology and pandemic preparedness work while protecting against nefarious use.

The fiscal year 2026 National Defense Authorization Act included specific measures on generating biological data to advance AI and securing U.S. biological data, signaling that policymakers recognize the stakes. However, experts argue that these policy initiatives must be paired with concrete investments in data generation, security, and standardization.

How Are Regional Health Systems Using AI and Genomics Today?

Beyond national policy, regional institutions are already demonstrating how AI and genomic tools can improve public health. Arizona State University's Health Observatory recently expanded its collaboration with TGen North, bringing the Pathogen Intelligence Center into the observatory to accelerate health research and innovation across Arizona. The Health Observatory transforms health data into health knowledge by drawing on data from electronic medical records, genomic sequencing, air quality, and insurance coverage.

"The goal of the Health Observatory is to transform health data into health knowledge. We generate a lot of health data and work with resources from the state health department, Medicaid services, hospitals, health care facilities and others, and use advanced analytical tools, modeling and AI to generate new health knowledge," said Dave Engelthaler, executive director of the observatory.

Dave Engelthaler, Executive Director of the ASU Health Observatory

The collaboration is designed to support precision health and data-driven approaches to care, particularly for rural and tribal communities that have historically been underserved. The Pathogen Intelligence Center brings two decades of experience using genetic tools to understand how diseases emerge, spread, and affect different populations. By integrating laboratory capacity, epidemiology, data science, and public communication, ASU aims to create a more coordinated statewide effort to address regional health challenges ranging from Valley fever to measles outbreaks.

The challenge ahead is not simply producing more information, but ensuring that health information is understandable, trusted, and useful to the communities it serves. As Engelthaler noted, dashboards and graphs are useful for some people but not for everyone, and they often lack the context needed to help people understand what health risks mean in their own lives. The Health Observatory is exploring web-based tools, community presentations, and immersive visualization experiences to translate complex health data into actionable knowledge.

The convergence of these efforts, at both the national policy level and in regional health systems, reflects a growing recognition that AI's potential in biology depends on solving the data problem first. Without coordinated, secure, well-curated biological datasets, neither the promise of personalized medicine nor the tools to defend against biosecurity threats can be fully realized.