OpenClaw's Candid Security Playbook: Why the Team Won't Promise Risk-Free AI Agents
OpenClaw, the fastest-growing open-source project in GitHub history, has stopped pretending that autonomous AI agents can be made completely safe. Instead, the team behind the self-hosted AI assistant released a detailed security roadmap that acknowledges real vulnerabilities while implementing practical defenses. The approach marks a shift from marketing hype to engineering honesty in a field where security failures have drawn regulatory scrutiny and researcher warnings.
The Austrian developer Peter Steinberger created OpenClaw as a personal project that exploded into a cultural moment. The software runs directly on users' computers, where it can read files, execute shell commands, install plugins, and access the internet. That power came with serious risks. Researchers discovered more than 150,000 vulnerable instances worldwide, malicious plugins harvested passwords from the ClawHub marketplace, and Gartner classified the entire project as an "unacceptable cybersecurity risk".
When Steinberger joined OpenAI in February 2026, he transferred OpenClaw to an independent foundation backed by the company. The project experienced technical problems and declining user numbers. Rather than retreat, the team published a transparent blog post detailing five specific security improvements, each with clear acknowledgment of what it does and does not solve.
What Are the Five Security Changes OpenClaw Is Rolling Out?
The team addressed vulnerabilities in the order they matter most to real-world deployments. Each solution targets a specific class of attack while being candid about remaining gaps.
- File System Escapes: A new library called fs-safe prevents OpenClaw from accessing files outside its intended folder by blocking tricks like symlinks and absolute paths. The team acknowledged this is not a true sandbox; plugins with shell command access can still do anything shell commands allow. The protection works for a specific class of errors.
- Network Vulnerabilities: A tool called Proxyline routes all network traffic through a central proxy that decides what connections are permitted. This prevents time-of-check-time-of-use attacks where a URL changes between verification and the actual request. The team noted that certain bypass routes remain possible, but control now lives in one place instead of scattered across code.
- Malicious Plugins: ClawHub plugins now receive clear ratings: clean, suspicious, held, quarantined, revoked, or malicious. Malicious plugins cannot be installed at all. Plugins from other sources like GitHub remain possible, preserving user control over their own computers. Higher trust levels for official packages and verified providers are planned.
- Confirmation Dialog Fatigue: OpenClaw now analyzes commands more intelligently, detecting hidden delete operations inside bash wrappers and highlighting them in the display. For OpenAI users, a separate "Auto Review" agent handles manual approval, reducing the number of prompts users see and click through.
- Automated Code Scanning: The team uses OpenGrep, a tool that checks every code proposal against 148 rules derived directly from previous security reports. The focus is precision over volume; a warning system that is too noisy gets ignored.
The blog post's language was notably different from typical vendor security announcements. Instead of claiming the problems were solved, the team explained what each measure actually prevents and where risks remain. "Anyone who promises a risk-free agent is selling something," the post stated.
Why Is OpenClaw Being So Honest About What It Cannot Fix?
The security improvements came after months of criticism from researchers, regulators, and enterprise security teams. Singapore's government issued a warning about OpenClaw's risks. The project's viral success in January 2026 had created a gap between hype and reality that no amount of marketing could close.
Steinberger himself acknowledged the constraints. He admitted that he no longer had enough time to maintain the project alone, which is why OpenAI's backing became necessary. The foundation's approach reflects a mature understanding of what autonomous agents actually are: powerful tools with irreducible risks that can be managed but not eliminated.
The candor also serves a practical purpose. Enterprise teams evaluating OpenClaw can now make informed decisions about where to deploy it and what guardrails to add. A team running OpenClaw on isolated machines with no internet access faces different risks than one running it on developer laptops with full network access. The security roadmap gives organizations the information they need to make those trade-offs.
How to Evaluate OpenClaw's Security for Your Use Case
- Deployment Context: Consider whether OpenClaw will run on isolated machines, developer workstations, or servers with network access. Each context requires different security assumptions and additional controls.
- Plugin Trust Model: Decide whether to allow plugins only from ClawHub's verified providers, permit community plugins from GitHub, or restrict to official packages. Each choice trades convenience for security.
- Network Isolation: Evaluate whether to route OpenClaw through an existing corporate proxy using Proxyline, run it on machines with no internet access, or allow direct connections. Proxy routing provides visibility but requires infrastructure.
- User Oversight: Determine how much manual review of agent actions is practical for your team. The Auto Review feature reduces confirmation fatigue but requires additional AI model calls and latency.
- Monitoring and Logging: Plan for continuous monitoring of what OpenClaw actually does, not just what it is supposed to do. The security roadmap assumes active oversight, not passive trust.
The foundation's approach also reflects broader industry trends. As autonomous AI agents move from research projects to production systems, the security conversation is shifting from "Is this safe?" to "What are the specific risks, and how do we manage them?" OpenClaw's transparency positions it as a reference point for that conversation.
Steinberger's move to OpenAI and the project's transition to a foundation suggest that the future of OpenClaw depends on sustained engineering investment, not viral momentum. The $1.3 million in OpenAI API tokens that Steinberger's team spent in a single month running 100 autonomous agents for development tasks shows the scale of compute required to maintain and improve the project.
The five-point security plan is not a final answer. It is a checkpoint in an ongoing process of learning from failures, implementing defenses, and being honest about what remains uncertain. For teams considering OpenClaw, that honesty may be more valuable than any promise of perfect safety.