Adapting Matt Pocock's grill-with-docs skill to create a Threat Modeling skill

I was working on a new app and came across Matt Pocock's /grill-with-docs skill that works as a structured interrogation tool that stress-tests your plan against your project's domain model, cross-referencing CONTEXT.md and ADRs to catch terminology drift and resolve design decisions before you write a line of code. Any decisions that crystallize get written back into your shared context, keeping the codebase's language consistent over time. I found it really useful and it got me thinking: this is exactly the kind of systematic, iterative process that threat modeling needs at the start of a design. So I created grill-threat-model, adapting the same pattern for security analysis.

The ADR Connection

grill-with-docs works by reading your README, domain glossary, and Architecture Decision Records (ADRs) before doing anything else. ADRs capture not just what architectural decisions were made, but why — the trade-offs, options considered, and context at the time. Things like why a particular auth mechanism was chosen, why a service boundary was drawn where it is, or why a third-party dependency was accepted.

grill-with-docs distills all of this into a CONTEXT.md that the agent uses to ground every response. And this is exactly what a threat model needs as input. When you enumerate threats, you're asking: what can go wrong given this trust model, this auth design, this data flow? ADRs give you the intent behind each architectural choice. If an ADR says "we chose JWTs over opaque tokens for statelessness," that's a prompt to check for algorithm confusion attacks, missing aud validation, and key rotation gaps. If it documents a decision to expose an internal service directly rather than through a gateway, that's a trust boundary worth scrutinizing.

So grill-threat-model picks up where grill-with-docs leaves off — it takes the same architectural understanding and turns it toward adversarial analysis. If your repo already has a CONTEXT.md, the threat model session starts with that context automatically.

What the Skill Does

The skill runs a five-phase workflow. It starts by reading everything — README, docs/, ADRs, Terraform/K8s configs, OpenAPI specs, entry points — and confirms scope, risk profile, and assumptions with you before generating a single threat. From there it writes threat-model/system-model.md with assets, trust boundaries, data flows, and a Mermaid DFD. Then it enumerates threats using STRIDE per component and boundary, enriched with ATT&CK technique IDs and CWE references, each stored as an individual file under threat-model/threats/.

The walkthrough phase goes through threats one at a time, critical-first. For each one it surfaces existing controls with evidence links into the codebase, recommends a response (avoid / mitigate / accept / transfer), and waits for your decision before moving on.

One thing worth calling out: the skill enforces a clear separation between response (the decision) and mitigation-state (what's actually implemented). Choosing "mitigate" doesn't mean the threat is mitigated. It won't mark something fully-mitigated until you can point to the code, config, or manifest that implements the control — and system-model.md reflects it. Tickets don't qualify. That sounds strict, but it matters when pentest findings come back on items that were "resolved" only in a backlog.

Session state persists in threat-model/THREAT-MODEL.md, so you can pause and resume from where you left off. If architecture changes invalidate more than 25% of existing threats, it bumps the version and archives the old ones rather than letting them go silently stale.

Example Output

The repo includes a complete example threat model I ran against Kubernetes Goat, an intentionally vulnerable Kubernetes lab. The agent enumerated 17 threats — covering wrong-cluster deployment risk, privileged pod escape to the operator host, SSRF via internal proxy to the metadata service, LAN exposure from port-forwarding bindings, and more. Each entry has the STRIDE category, existing controls with evidence, recommended actions, and the decision record. 3 critical threats, all non-mitigated or partially-mitigated at the time of writing. It's a good reference for what the output looks like before you run it on your own repo.

Getting Started

You can run the skill against an application codebase, an IaC repo (Terraform, Pulumi, CDK), architecture docs, or a monorepo with all of the above. It'll work with whatever context it can find, but the more you give it — ADRs, OpenAPI specs, infra configs — the more grounded the threat entries will be. A repo with just a README will still produce output, but it won't be as precise.

Install as a project skill:

mkdir -p .cursor/skills
cp -r .cursor/skills/grill-threat-model /path/to/your-repo/.cursor/skills/

Then in Cursor: "Run grill-threat-model on this repo" or "Threat model the API using the architecture docs."

Repo is here: github.com/nebulaa/grill-me-threat-model. The underlying methodology is documented separately — worth reading first if you want the process before the tooling.
A previous blog post about threat modelling: https://nebulablogs.com/threat-modeling-a-complete-how-to-guide was used as a reference to create this skill. Feel free to check it out as well.