Normally this newsletter leads with something I broke in the lab. This week is different.

Over the last few weeks I built a set of tools and field references for teams putting AI agents into production. I shipped them, put them online, and then did the thing every builder does: moved on to the next thing without telling anyone they exist. That's a mistake. They're useful, they're free, and you're the people they were built for.

So this is a release issue. One asset I think matters most, the rest in order of how fast they pay off, and exactly how each one helps you. No vendor PR, no sign-up wall beyond email where noted. Just the stuff, and what to do with it.

Start here: the Agent Security Scorecard

If you do one thing from this issue, do this one.

It's an interactive assessment that scores your AI agents against the OWASP Agentic Top 10 across five domains: Data and Training Governance, Model Security, Agent Controls, Supply Chain, and Detection and Response. Twenty questions, about twelve minutes. At the end you get a maturity score, a radar chart of where you're strong and where you're exposed, your top three risks, and a 30-day remediation roadmap.

Find out where you stand securing your AI agents

The reason I'd start here rather than with a PDF is that this one is personalized. A static checklist tells you what good looks like in general. The scorecard tells you where your setup sits right now, and what to fix first. If you run agents in production and you've never put a number on your posture, this is the fastest way to get one.

Score your agents, then take the radar chart into your next security review. That single image tends to start better conversations than any report I've written.

The deep reference: AI Agent Security Field Guide (PDF)

This is the document I'd hand a security engineer who just got told "we're shipping agents next quarter, make it safe."

Twenty-plus pages mapping all ten OWASP Agentic Top 10 categories to real attack patterns and production-ready mitigations. For each category: the actual attack, then the specific controls that stop it. It's built as a field guide, not a whitepaper, so it's something you act from, not something you cite and forget.

The research behind it came from looking at how nine companies actually run agent security in production: Anthropic, Microsoft, HolmesGPT, Cloudflare, GitLab, Palantir, Datadog, Klarna, and others. The mitigations in here are what survives contact with real deployments, not what sounds good on a slide.

Keep it open in a tab the next time you're threat modeling an agent.

The fast win: Agent Pre-Deployment Security Checklist (PDF)

Twenty-five controls across five families: probabilistic testing, supply chain, tool controls, injection defense, and sign-off. Each control is a checkbox, a one-line description, and a short note on why it matters.

This is the one to use the day before you ship. Run an agent down the list, and the gaps light up fast. It's short and scannable on purpose. Nobody reads a forty-page gate document the night before a release. They will tick twenty-five boxes.

If you only have ten minutes before a deployment review, this is your ten minutes.

The one that changes how you model: 5 Ways AI Breaks Threat Modeling (PDF)

Most threat modeling advice for AI is just the OWASP LLM Top 10 reprinted with a new cover. This one is different. It takes the STRIDE process your team already runs and shows the five specific places it quietly fails when the system under test is an agent, plus the controls that fill each gap.

The framing I keep coming back to: your STRIDE doc is about 80% of the way there. This is the missing 20% for AI. If your team already threat models and you're wondering what you need to add rather than replace, start here.

For the identity conversation: AI Agent Identity Readiness Checklist (PDF)

Agent identity is the problem most teams haven't looked at yet and will be forced to soon. When an agent acts on your behalf, whose credentials does it carry, what's it scoped to, who can audit it, and what happens when it's retired?

This checklist covers twenty-five items across five categories: inventory, credential hygiene, scoping, delegation and audit, and lifecycle. It's the pre-flight you run before any agent touches production identity, and the starting point if you're staring down a migration to proper agent identity and don't know what to inventory first.

For incident readiness: AI Agent Containment Rubric (PDF)

When an agent does something it shouldn't at 2am, can your team contain it? This rubric lets you self-score across five dimensions: detection, isolation, response, communication, and improvement, each on a one-to-five scale with clear definitions at each level.

It's exec-legible, which is the point. You can run it in a tabletop, hand the scores to leadership, and everyone immediately understands where the gaps are. With the EU AI Act timelines bearing down, "can we contain an AI incident" is a question your board is going to ask. This gives you an honest answer before they do.

The whole set, at a glance

Asset

Format

Best for

Link

Agent Security Scorecard

Webapp, ~12 min

Knowing where you actually stand

AI Agent Security Field Guide

PDF, 20+ pages

Deep reference while building

Pre-Deployment Checklist

PDF, 25 controls

The day before you ship

5 Ways AI Breaks Threat Modeling

PDF

Extending STRIDE for agents

Identity Readiness Checklist

PDF, 25 items

Before agents touch identity

Containment Rubric

PDF, 5 dimensions

Incident readiness, tabletops

What I'd actually do with these

If you're an engineer shipping agents: run the scorecard, then keep the field guide and pre-deployment checklist open while you build.

If you lead a team: run the containment rubric in a tabletop this quarter, and use the threat modeling guide to upgrade your existing process instead of replacing it.

If you're staring down the AI Act deadlines: the containment rubric and identity readiness checklist are the two that map most directly to what auditors and boards are starting to ask for.

All of it is free. None of it is gated behind a sales call. If something in here saves you a bad afternoon, that's the whole point.

Reply and tell me which one you used and what it caught. The real test of these is whether they hold up against your stack, and your findings make the next versions better.

See you next week, back to breaking things in the lab.

— Amine

Keep reading