Grok and Other AI Chatbots Face US Government Safety Testing Before Public Release
The US Department of Commerce will now test new artificial intelligence tools from Google, Microsoft, and xAI, including Grok, before they reach the public. The three tech companies have voluntarily agreed to submit their models for evaluation through the Commerce Department's Centre for AI Standards and Innovation (CAISI), expanding a safety testing program that previously included only OpenAI and Anthropic.
Why Is the US Government Suddenly Testing AI Chatbots?
The shift toward government oversight represents a notable change in approach, particularly given the Trump administration's stated commitment to deregulation and removing "red tape" around AI development. However, the expansion of safety testing reflects growing concerns about AI capabilities and their potential risks. The military's increasing reliance on AI systems, combined with recent claims by Anthropic that it developed a model called Mythos that is too powerful for public release, has prompted the White House to reconsider its hands-off approach.
The CAISI has already conducted 40 previous evaluations of AI tools, including testing certain unreleased state-of-the-art models. The centre did not specify which models have been prevented from reaching the public, but the program demonstrates that some AI systems are deemed too risky for widespread deployment without government review.
What Specific AI Models Are Being Tested?
The three companies submitting models for testing represent some of the largest players in the AI industry. Google's primary AI tool is Gemini, a chatbot widely available across Google products and now used by US defence and military agencies. Microsoft's offering is Copilot, its AI assistant integrated into Windows and Microsoft 365. xAI's only AI product is Grok, a chatbot that has faced public scrutiny over issues including generating inappropriate images of people.
The evaluations will cover a broad range of assessments designed to ensure safety and security before commercial release:
- Testing Protocols: Comprehensive evaluation of AI model performance, safety, and potential vulnerabilities before public deployment
- Collaborative Research: Joint efforts between government agencies and tech companies to identify and address emerging AI risks
- Best Practice Development: Creation of industry standards and guidelines for responsible AI system development and deployment
Microsoft acknowledged the importance of this collaboration in a corporate blog post, stating that while the company already tests its AI models internally, "testing for national security and large-scale public safety risks necessarily must be a collaborative endeavour with governments".
Microsoft
"These expanded industry collaborations help us scale our work in the public interest at a critical moment," said Chris Fall, director of CAISI.
Chris Fall, Director of the Centre for AI Standards and Innovation
Google's DeepMind subsidiary declined to comment on the testing arrangement, while a representative from SpaceX, the Elon Musk company that now controls xAI, did not respond to requests for comment.
How Does This Fit Into Broader AI Regulation Efforts?
The voluntary safety testing program represents an evolution of agreements reached during the Biden administration with companies like OpenAI and Anthropic. The expansion to include Google, Microsoft, and xAI signals that government oversight of AI development is becoming more comprehensive, even as the Trump administration pursues a broader deregulation agenda.
The timing is significant. Last year, President Donald Trump signed executive orders forming his administration's "AI Action Plan," which aimed to "remove red tape and onerous regulation" around AI development. Yet the decision to expand safety testing suggests that national security concerns and the military's growing dependence on AI systems have created pressure for at least some level of government review.
Senior members of Trump's staff met last month with Anthropic CEO Dario Amodei, even as the company faces a lawsuit with the US Department of Defense over its refusal to remove safety guardrails for government use of its models. This tension between deregulation and security concerns underscores the complexity of AI governance in the current political environment.
The voluntary nature of these agreements means companies maintain significant control over their development processes, but the government's ability to evaluate models before release provides a safety checkpoint that didn't exist for most commercial AI systems until recently. Whether this approach will prove sufficient to address emerging AI risks remains an open question as the technology continues to advance rapidly.