Your AI Agent's Green Security Light Might Be Lying to You
Most AI security scanners only cover about one-third of the actual supply chain risks in agentic systems, missing critical vulnerabilities that don't appear in code manifests until the agent starts running. A fintech platform engineer recently discovered this the hard way: her team's customer-support AI pipeline showed zero security issues on the dashboard, yet research published that same week documented 341 malicious agent skills in a popular marketplace, namespace-hijacking attacks replacing models in production, and techniques to bypass security scanning entirely.
Why Do Standard Security Tools Fail on AI Agents?
The problem isn't that security scanners are broken. It's that they were built for a different world. Traditional software composition analysis (SCA) tools assume every component is a versioned package with a published vulnerability database. They scan your requirements.txt file, check for known security flaws, and call it done. This mental model works fine for conventional software, but it collapses when applied to agentic AI systems.
AI workloads introduce three structurally different supply chain surfaces, and no single scanning methodology covers all three. Model weights aren't packages; they're binary blobs from registries like Hugging Face with no versioning system that traditional scanners understand. The tools an agent invokes aren't packages either; they're JSON-RPC interfaces whose descriptions themselves become threat surfaces. Agent skills installed into marketplaces are bundles of natural-language instructions plus minimal code, where malicious payloads often hide in the prose itself.
The real operational gap emerges at runtime. Half the components a scanner needs to assess don't appear in any manifest until the agent loads them dynamically. Agent frameworks like LangChain pull model adapters on demand, frameworks load tools at first invocation, and Model Context Protocol (MCP) servers establish connections that were never declared in a Kubernetes manifest. A security scan run during the deployment pipeline is reading the floor plan; the building keeps adding rooms after move-in.
What Are the Three Supply Chain Surfaces Agentic Systems Actually Expose?
Understanding where traditional scanners succeed and fail requires breaking down the distinct threat surfaces in agentic AI architectures:
- Code Dependencies: The packages in requirements.txt or package.json plus the framework code itself, including LangChain, vLLM, Triton Inference Server, Ray, FastChat, and agent frameworks like AutoGPT and CrewAI. This surface is closest to traditional SCA's mental model and the one most existing tools handle competently.
- Model Weights and Fine-Tuned Adapters: Binary model files, LoRA adapters, and other model artifacts pulled from registries at runtime. These carry zero versioning information that SCA tools can read and no CVE concept that applies to their content, making them invisible to conventional scanners.
- Dynamic Tool and Skill Loading: Agent skills installed into marketplaces, MCP toolkit connections, and community-contributed tools that load based on prompts or runtime decisions. These components often exist as natural-language descriptions rather than versioned code packages.
The first surface gets genuine coverage from standard SCA tools. Real vulnerabilities exist in the AI framework ecosystem: CVE-2023-29374 in LangChain's LLMMathChain allowed code injection via exec/eval, CVE-2024-21513 in langchain-experimental enabled arbitrary code execution, and most recently CVE-2025-68664, the "LangGrinch" serialization flaw in langchain-core disclosed in December 2025. A competent SCA tool flags these, developers upgrade, and the finding closes.
But the second and third surfaces remain almost entirely dark to traditional scanners. Inference servers ship as containers with embedded model weights; Trivy or Grype catches the Python packages and operating system-layer vulnerabilities, but does not assess anything in the model directory because model weights aren't packages. That work belongs to a completely different scanning methodology.
How to Implement Runtime-Aware Security Scanning for AI Agents
- Pair SCA with Runtime Reachability Analysis: Standard SCA tools should be paired with runtime reachability analysis that only flags vulnerabilities in code paths the workload actually loads into memory and executes. This approach reduces false positives by roughly 90% on production AI workloads because the unused-surface-area problem is structural to how AI frameworks ship.
- Build an AI Bill of Materials from Runtime Evidence: Derive an AI-specific bill of materials (AI-BOM) from what actually loads during execution, not just what appears in deployment manifests. This tells the scanner which framework code paths actually loaded, transforming the vulnerability queue from theoretical to actionable.
- Implement Post-Deployment Continuous Scanning: Recognize that half the components in an agentic system don't exist in any manifest until runtime. Establish continuous scanning that monitors what agents actually load, invoke, and connect to after deployment, not just what was declared at build time.
- Audit Marketplace Skills and Tool Integrations Separately: Agent skills and community-contributed tools require different scanning logic than code packages. Audit these components for malicious payloads in natural-language descriptions, not just code analysis.
The operational reality is that pre-installation scanning alone never solves AI supply chain risk. A scan run at continuous integration and continuous deployment (CI/CD) time captures one snapshot; the building keeps adding rooms after move-in. Security teams need visibility into what actually loads at runtime, which components get invoked by which prompts, and which external services the agent connects to during execution.
What Real-World Evidence Reveals About Current Gaps?
Recent security research has exposed the scale of the problem. Koi Security audited 2,857 community skills on the ClawHub agent-skill marketplace and found 341 carrying malicious payloads. Palo Alto Networks' Unit 42 demonstrated namespace-hijacking attacks that successfully replaced popular Hugging Face models in production Vertex AI deployments. ReversingLabs published "NullifAI," a technique using deliberately broken pickle files to bypass Hugging Face's own scanning pipeline.
These findings underscore why the green dashboard doesn't tell the whole story. The scan ran cleanly against the surface it was built for. The question is whether that surface is the right one for an agentic workload. OWASP recognized this structural gap when it elevated LLM03:2025 Supply Chain in the latest LLM Top 10 released in November 2024 and again in the OWASP Top 10 for Agentic Applications 2026 released in December 2025 at Black Hat Europe, which catalogs risks including tool poisoning and identity abuse paths that explicitly cross supply-chain boundaries.
The operational gap remains: neither catalog tells security teams how to actually scan against the categories it names. That's because the scanning methodology itself needs to change. Misclassifying a finding from one surface as belonging to another is the most common reason teams believe they already do AI supply chain scanning when they're actually just running their traditional SCA tool against the namespace where AI workloads happen to live.
For organizations deploying agentic AI systems in production, the implication is clear: a green security dashboard doesn't mean the system is secure. It means the scanner covered the surface it was designed for. The other two surfaces, where half the actual risk lives, remain largely invisible until something goes wrong.
" }