Hey 👋,
Most RAG security discussions focus on the wrong attacker. Insider, APT, compromised pipeline — all real, all valid. But the more common risk is boring: stale documents, outdated policies, contradicted facts that accumulated over months. The adversarial case is just the extreme end of a problem that exists in every large document store. Same architecture, same defenses.
That framing came out of the HN discussion on my RAG poisoning post this week (41 comments). I ran knowledge base poisoning against a local ChromaDB + LangChain stack - 95% success rate, under 3 minutes, no external GPU, no cloud. Cross-tenant leakage hit 100% on 20 queries with zero sophistication. Full breakdown with measured defenses. lab code runs in 10 minutes make attack1).
One finding worth highlighting: the defense layer most teams skip — embedding-level anomaly detection — dropped poisoning success from 95% to 20% on its own, because it catches the clustering signal of coordinated injection that regex and text filters miss entirely.
THIS WEEK IN AI SECURITY
AdvJudge-Zero: 99% bypass rate against LLM safety judges
Palo Alto Unit 42 released research on an automated fuzzer that defeats the ML classifiers most platforms use to catch policy violations before output reaches users. 99% success across models with 70B+ parameters, using low-perplexity input sequences - markdown symbols - that manipulate the classifier's decision logic without touching the underlying model.
HashJack: malicious instructions hidden after the # in a URL
Cato CTRL documented the first indirect prompt injection that hides instructions in the URL fragment — the text after #. Web servers never see the fragment, so WAFs and IPS miss it entirely. When an AI browser assistant loads the page, the hidden instructions execute. Six scenarios: phishing, data exfiltration, credential theft, misinformation. Perplexity and Microsoft patched. Google: "won't fix, intended behavior."
Claude Opus 4.6 found 22 Firefox vulnerabilities, wrote 2 working exploits
Anthropic published a two-week autonomous Firefox security review. 22 vulns found, 14 rated High by Mozilla. The exploit generation test: 350 attempts, 2 working exploits, $4,000 in API credits. Exploits only work in a test environment that removes browser sandboxing.
MCP caller identity confusion — 38% of servers unauthenticated at scale
Researchers quantified what the community suspected: MCP servers frequently share one authorization decision across multiple callers. A compromised agent escalates into other tools via misattributed calls — consistent with Adversa AI's March figure.
Confident orgs have 2× the AI incident rate
Survey of 205 security leaders: most confident orgs had twice the incident rate of less confident peers. 43% report AI making infrastructure changes monthly without oversight. 7% don't track autonomous changes at all.
TOOLING WORTH KNOWING
Kvlar — open-source proxy enforcing YAML policies on every MCP tool call, fails closed by default. Drop it in front of any MCP server without reengineering your harness. github.com/kvlar-io/kvlar →
RankClaw — scan AI skills/plugins for security risks before installing. Good pre-flight for MCP servers from the ecosystem. rankclaw.com →
mcp-scan — Invariant Labs' scanner for malicious tool descriptions. Catches direct poisoning and rug-pull variants at load time. Should be in every MCP deployment's CI. github →
ONE THING TO CHECK THIS WEEK
If you're running a RAG system: pull one query a low-privilege user runs and check what documents came back. If classification metadata isn't in the where clause of your vector store query, every user has read access to everything in the collection. Three lines of code fix it. github.com/aminrj/rag-security-lab →
ALSO WORTH YOUR TIME
AI is Eating Security — Alex Stamos at SnooSec: high-confidence predictions including every company needing to care about 0-days within 6–9 months. The slide on AI impact across SOC/AppSec/DFIR is sharp. Talk →
Cisco State of AI Security 2026: 83% plan agentic deployment, 29% feel ready. Most useful single number for board conversations. Cisco blog →
WHAT I'M WATCHING
→ Semantic injection: 15% residual with all defenses active is the current floor — closing it requires intent classifiers, not regex
→ AdvJudge-Zero below 70B parameters — whether the technique holds for smaller deployed models matters more operationally
→ HashJack toolkits — technique requires no server compromise; gap between research and commodity exploit is short
→ MCP caller identity standardization — arXiv paper will push this onto OWASP's update agenda
If this was useful, forward it to one person on your team working on RAG or agentic deployments. Questions, pushback, topics you want covered — reply directly, I read everything.
Cheers,
Amine
