Hey there 👋,
Welcome back to Issue #2 of AI Security Intelligence.
I started this newsletter because I found myself struggling to keep up with the mind-boggling speed at which Generative AI, LLMs, and Agentic AI are being embraced across every domain. And the Veracode 2026 report, covered below, confirms it with real-world numbers.
The security side of this adoption is either not taken seriously or not well understood, and the consequences are far more severe than most people realize.
Yes, AI and AI Agents seem to work magic. But let's not shoot ourselves in the foot by introducing attack surface with more consequences than we can afford to ignore.
Last week I said the pattern from DockerDash (untrusted context flowing through MCP into unvalidated tool execution) would repeat. It took exactly seven days. This week brought three critical Claude Code vulnerabilities exploiting the same trust boundary confusion, the first documented case of infostealers targeting AI agent identity files, and Amazon's forensic breakdown of a low-skill attacker using commercial AI to pop 600+ firewalls across 55 countries.
And of course, the devastating ClawJacked flaw that let malicious websites hijack local OpenClaw AI agents via WebSocket.
The theme this week is uncomfortable: the tools we're building to make developers faster are also making attackers faster, and the AI agents we're deploying are becoming high-value targets in their own right.
Dense issue. Let's get into it.
AI Threats & Incidents
ClawJacked: Malicious Websites Can Hijack Local OpenClaw AI Agents via WebSocket
Oasis Security disclosed a high-severity vulnerability in OpenClaw's core gateway, codenamed ClawJacked, that allows any website to silently take over a locally running AI agent.
The attack chain: malicious JavaScript on a web page opens a WebSocket connection to localhost on the OpenClaw gateway port, brute-forces the gateway password (no rate limiting), registers as a trusted device (auto-approved without user prompt for local connections), and gains complete control over the agent. From there, the attacker can interact with the agent, dump configuration data, enumerate connected nodes, and read application logs.
The critical design flaw: the gateway relaxes several security mechanisms for local connections, including silently approving new device registrations. OpenClaw patched within 24 hours in version 2026.2.25. But the blast radius extends further: reports from Bitsight and NeuralTrust detail how exposed OpenClaw instances can be weaponized through prompt injections embedded in emails or Slack messages processed by the agent. (OWASP: LLM01 Prompt Injection + LLM03 Supply Chain + LLM06 Excessive Agency)
💡 This is a full-spectrum attack on the agentic AI ecosystem, all in one week. WebSocket hijacking of the core gateway, log poisoning for indirect prompt injection, a malicious skill marketplace with agent-to-agent social engineering. The Moltbook attack is especially worth studying: an AI agent posing as a legitimate peer on a social network for agents, promoting malicious skills to other agents who trust it by default. That is supply chain poisoning adapted for the agentic era. Microsoft's recommendation to treat OpenClaw as untrusted code execution is the right call, and every team deploying local AI agents should apply the same standard regardless of framework.
Claude Code RCE + API Key Exfiltration via Malicious Repos (CVE-2025-59536, CVE-2026-21852)
Check Point Research disclosed three critical vulnerabilities in Anthropic's Claude Code that turn repository configuration files into active attack vectors. The attack is devastatingly simple: an attacker commits a malicious .claude/settings.json to a repo, a developer clones and opens it, and Claude Code executes arbitrary commands before the trust dialog even appears. A stolen API key grants access to the entire Workspace: read/write all shared files, upload malicious content, exhaust credits. All three vulnerabilities are patched. (OWASP: LLM03 Supply Chain + LLM06 Excessive Agency)
💡 This is the AI supply chain version of the classic "malicious .git hooks" attack, but worse. Traditional git hooks require explicit execution; Claude Code's configuration files execute implicitly because the tool treats them as trusted operational logic.
AI-Augmented Threat Actor Breaches 600+ FortiGate Devices Across 55 Countries
Amazon Threat Intelligence published a detailed forensic report on a Russian-speaking, financially motivated actor who used multiple commercial GenAI services to compromise 600+ FortiGate devices between January 11 and February 18, 2026. No vulnerabilities were exploited. The campaign succeeded entirely through exposed management ports and weak credentials with single-factor authentication. What makes this significant: the actor had limited technical skills. AI wrote their Python recon scripts, generated step-by-step attack plans from stolen network topologies, parsed and decrypted FortiGate configs, and planned lateral movement into Active Directory and Veeam backup infrastructure (a classic pre-ransomware pattern). When the actor hit hardened environments, they simply moved on. (MITRE ATLAS: AML.T0054 AI-Assisted Techniques)
💡 This is the report that settles the "will AI help attackers?" debate. Not theoretically, but forensically. Amazon's finding that a low-skill individual achieved the operational scale of a mid-tier APT group through AI augmentation is the clearest evidence yet that GenAI is a force multiplier for offense. But note also that every single compromise would have been prevented by basic security hygiene: no exposed management ports, no default credentials, MFA enabled. AI didn't unlock new attack techniques; it just removed the skill floor for existing ones. The defensive implication: your security fundamentals are now your AI defense strategy.
First-Ever Infostealer Caught Stealing AI Agent Identity: OpenClaw Configuration Exfiltrated
Hudson Rock documented the first in-the-wild case of infostealer malware harvesting an AI agent's complete identity. A Vidar variant's broad file-grabbing routine scooped up the victim's .openclaw directory, capturing: openclaw.json (gateway authentication token, email, workspace path), device.json (public and private cryptographic keys for device pairing), and soul.md + MEMORY.md (the agent's behavioral instructions, daily activity logs, private messages, and calendar data). (OWASP: LLM02 Sensitive Information Disclosure + LLM03 Supply Chain)
💡 Hudson Rock calls this "the transition from stealing browser credentials to harvesting the souls and identities of personal AI agents." That's not hyperbole. An OpenClaw identity file doesn't just give you a password; it gives you the agent's entire operational context, behavioral instructions, cryptographic keys, and a memory file that maps the victim's life. This is identity theft at a layer that didn't exist 18 months ago.
Model Security & AI-on-AI Attacks (New Category)
Anthropic Discloses Industrial-Scale Distillation Attacks by DeepSeek, Moonshot AI, and MiniMax
Anthropic published detailed attribution of three distillation campaigns by Chinese AI labs that generated over 16 million exchanges with Claude through ~24,000 fraudulent accounts. DeepSeek (150K+ exchanges) targeted reasoning and censorship-safe alternatives to politically sensitive queries. Moonshot AI (3.4M+ exchanges) targeted agentic reasoning, tool use, computer-use agents, and later attempted to extract and reconstruct Claude's reasoning traces. MiniMax (13M+ exchanges) ran the largest campaign, targeting agentic coding and orchestration. Anthropic detected this while it was still active and watched MiniMax pivot 50% of traffic within 24 hours when a new Claude model dropped. The campaigns used "hydra cluster" proxy networks where a single setup controlled 20,000+ accounts simultaneously, mixing distillation traffic with legitimate requests to evade detection. Anthropic attributed each campaign through IP correlation, request metadata, and in some cases matched requests to specific researchers at the labs. This follows Google's disclosure earlier this month of similar extraction attacks against Gemini's reasoning capabilities.
💡 Two things matter here beyond the headline. First, the operational tradecraft: load-balancing across accounts, mixing extraction traffic with legitimate requests, pivoting to new models within 24 hours of release. This isn't ad hoc, it's a production-grade data pipeline for systematic capability theft. Second, the national security angle: Anthropic specifically notes that illicitly distilled models are unlikely to retain safety guardrails, meaning dangerous capabilities proliferate with protections stripped out. The distillation attack surface is now a first-class security concern for any frontier AI lab, and it directly undermines the export control regime that assumes chip restrictions alone limit capability transfer.
MCP & Agent Security
30 CVEs in 6 Weeks: MCP's Attack Surface Expands Into Three Distinct Layers
Kai Security's autonomous scanning agent documented 30 CVEs across the MCP ecosystem between January and February 2026, revealing that the attack surface has expanded beyond the original server-layer exec() injection pattern into three distinct tiers. Layer 1 (43% of CVEs): the familiar exec()/shell injection family. Layer 2 (20%): tooling and infrastructure, including MCP inspectors, scanners, and host applications. This includes CVE-2025-66401 (a security scanner for MCP that itself has command injection) and CVE-2026-23744 (MCPJam Inspector, a dev platform that exposes an unauthenticated endpoint capable of installing arbitrary MCP servers, listening on 0.0.0.0 by default). Layer 3 (13%): authentication bypass on critical endpoints.
💡 The meta-irony of a security scanner having command injection is almost too perfect. But the real insight is the attack surface migration pattern: server fixes push attackers to tooling, tooling fixes will push them to client applications. We're watching the MCP security maturity curve compress years of web security evolution into months.
Kali Linux Ships Native Claude AI Integration via MCP
Kali Linux officially documented a native AI-assisted penetration testing workflow integrating Anthropic's Claude via MCP. The architecture: Claude Desktop (macOS/Windows) as the UI, Claude Sonnet 4.5 as the intelligence layer, and a Kali instance running mcp-kali-server (a Flask-based API on localhost:5000) as the execution layer. Testers type natural language prompts like "Port scan scanme.nmap.org and check if a security.txt file exists", Claude selects the tool, executes via MCP, parses results, and iterates. Supported tools include Nmap, Gobuster, Dirb, Nikto, Hydra, SQLMap, Metasploit, and others.
💡 This is a milestone worth flagging. Not because it's technically novel (people have been scripting LLMs into pentest workflows for a year), but because it's official Kali documentation blessing the pattern. It legitimizes MCP as the integration layer for offensive security tooling.
Reports & Research
Veracode 2026 State of Software Security: AI-Driven Development Outpaces Security
Veracode's annual report, based on 1.6 million applications generating 141.3 million raw findings, delivers a stark conclusion: "The velocity of development in the AI era makes comprehensive security unattainable." Key numbers: 82% of organizations now harbor security debt (up from 74% last year), 60% carry critical security debt (20% YoY increase), and high-risk vulnerabilities spiked 36% YoY. The report explicitly identifies AI-driven development as a factor: more code ships faster, but remediation hasn't scaled to match. The bleak quote: "the remediation gap has reached crisis proportions; incremental improvements insufficient; transformational change required."
💡 Pair this with the Cloudflare disclosure that they built a significant application in a week with AI and "no human review of most of the code." The Veracode data quantifies the debt that vibe-coding creates at enterprise scale. The uncomfortable implication: AI coding tools are increasing organizational attack surface faster than AI security tools can reduce it. Every security team needs to be asking: what is our flaw creation-to-remediation ratio, and how has it changed since we adopted AI coding assistants?
CoSAI Releases MCP Security Whitepaper: 12 Threat Categories, ~40 Distinct Threats
The Coalition for Secure AI (an OASIS Open Project backed by Anthropic, Google, IBM, Meta, Microsoft, NVIDIA, and others) released a comprehensive MCP security framework. The paper identifies 12 core threat categories spanning nearly 40 distinct threats, from familiar concerns amplified by AI mediation (identity spoofing, tool poisoning) to novel attack vectors unique to agent-based systems (full schema poisoning, shadow MCP servers, resource content poisoning, typosquatting/confusion attacks). Key recommendations: end-to-end agent identity and traceability, least-privilege access for all MCP servers, zero-trust validation for AI outputs, and hardware-based isolation. The whitepaper is on GitHub.
💡 This is the closest thing we have to a canonical threat model for MCP. If you're building or securing MCP deployments and haven't read it, stop here and go read it. Two CoSAI sessions are also on the RSAC 2026 agenda (March 25-26) covering MCP-specific defenses, worth attending if you're going.
Cisco Expands AI Defense for the Agentic Era: AI BOM, MCP Catalog, Multi-Turn Red Teaming
Cisco announced the largest expansion of AI Defense since its January 2025 launch. New capabilities include: AI BOM (Bill of Materials) for centralized visibility across AI assets including MCP servers and third-party dependencies; an MCP Catalog for discovering, inventorying, and managing risk across MCP servers and registries; advanced multi-turn algorithmic red teaming for models and agents across multiple languages; and real-time agentic guardrails that monitor for tool poisoning and unauthorized tool use. Cisco SASE also gains MCP visibility with logging and policy control, plus intent-aware inspection that evaluates the "why" behind agentic messages, a direct response to the semantic gap problem highlighted by DockerDash (See last Issue for more details).
💡 Cisco bringing MCP into their SASE stack is significant. It means network security vendors are starting to treat agent-to-tool traffic as a first-class inspection domain, not an application-layer afterthought.
Governance & Compliance Watch
NIST CAISI RFI: Comment Period Closes March 9
Reminder from last week: the NIST AI agent security RFI (docket NIST-2025-0035) closes March 9, 2026. If you have operational experience securing agentic systems, especially around MCP, tool-use trust boundaries, or non-human identity management, submit your input. This will shape the first U.S. federal guidelines for AI agent security. Given the volume of MCP incidents we've covered in just two issues, your production experience matters.
OpenSSF Schedules Agentic AI Security Tech Talk: March 17
The OpenSSF AI/ML Security Working Group is hosting a tech talk on March 17 covering how they're developing open guidance and frameworks for securing AI/ML systems, plus their free course Secure AI/ML Driven Software Development (LFEL1012). Also: Open Source SecurityCon Europe is co-located with KubeCon on March 23 in Amsterdam, with AI security tracks on the agenda.
Anthropic-Pentagon Standoff Escalates
In a development with direct implications for AI security governance, Defense Secretary Hegseth declared Anthropic a "supply chain risk to national security" on Friday, restricting military contractors from doing business with the company. The dispute centers on Anthropic's refusal to remove certain safety guardrails, specifically prohibitions on autonomous weapons and mass surveillance, from Claude's deployment on classified networks. OpenAI announced the same day that it had reached agreement to deploy its models on the Pentagon's classified network. Regardless of your position on the policy, the precedent matters: it establishes that an AI company's safety posture can be reframed as a supply chain risk, which has implications for how defense procurement evaluates AI vendors going forward.
Tools & Resources
Quicklinks
CoSAI MCP Security Whitepaper — The canonical MCP threat model. 12 threat categories, ~40 distinct threats, actionable security controls. Required reading if you're building or securing MCP deployments.
AgentAudit — A security registry for AI agent packages (MCP servers, npm/pip packages, AgentSkills). 194 packages audited, 118 findings, free API for checking packages before installation. Think CVE database for agent packages.
Adversa AI MCP Security Top 25 — Curated monthly digest of the most critical MCP security resources, research papers, and vulnerability disclosures. Good companion to CoSAI's whitepaper.
mcp-kali-server — Official Kali Linux MCP server package for AI-assisted penetration testing. Install via
apt install mcp-kali-server. Study this for how MCP is being operationalized in offensive security.Kai Security MCP Scanner — Autonomous agent that has scanned 560 MCP servers and documented 30 CVEs. Public API, responsible disclosure, and live scan reports. Use it to check your own MCP endpoints.
💡 My Take
Three observations from this week.
First, the AI tool supply chain is now a first-class attack surface and we need to talk about it more seriously. Claude Code's .claude/settings.json, MCP's .mcp.json, OpenClaw's .openclaw directory: these are all config files that live in repos or on developer machines and that now have the power to execute code, redirect API traffic, and compromise entire workspaces. We understood the risk of malicious npm packages. We understood the risk of poisoned Docker images. But configuration files that silently become execution logic in the presence of an AI tool? That's a new trust boundary most organizations haven't even mapped, let alone monitored.
Second, we are getting our first empirical evidence of AI-augmented offense at scale. The Amazon FortiGate report isn't a proof of concept or a red team exercise. It's a forensic analysis of a real campaign where a low-skill actor achieved enterprise-grade operational scale through GenAI. The distillation attacks Anthropic disclosed aren't theoretical, they're production data pipelines with 20,000-account proxy networks. These are not lab results. This is the threat landscape now.
Third, AI agent identity is the next credential class that security teams need to protect. The OpenClaw infostealer case shows that agent config directories containing gateway tokens, cryptographic keys, behavioral instructions, and memory files are now targets for commodity malware. We protect SSH keys. We protect API tokens. We rotate cloud credentials. But how many organizations are rotating their AI agent gateway tokens, encrypting agent memory files at rest, or monitoring for unauthorized access to agent config directories? Almost none, I'd guess. That needs to change before dedicated "AI-stealer" modules ship.
If I had to boil this week down to one sentence: AI is simultaneously expanding the attack surface (via tools), amplifying the attacker (via GenAI), and creating entirely new target categories (via agent identity), all faster than the defensive ecosystem is adapting.
— Amine Raji, PhD
Wrapping Up
If you found this useful, forward it to one colleague who's deploying AI coding tools without auditing their config file trust model. They need this.
Want the LLM & AI Agent Security Field Guide? Complete OWASP LLM Top 10, Agentic Top 10, copy-paste detection patterns, and a 14-point security assessment checklist. Reply "field guide" and I'll send it over.
Questions, comments, feedback? Reply directly, I read everything.
See you next week.
Cheers,
Amine
