Finding Vulnerabilities with Warden

We found more than 100 previously unknown vulnerabilities across Sentry and a few other open source projects with Warden.

I’m here once again to convince you this is worth your time, and to explain our approach.

#What is Warden

Warden is a simple harness-on-a-harness that creates a set of tools and workflows on top of existing (or purpose-built) skills, turning them into what appears like your typical code review agent. It includes a command line tool that can run these skills on top of a set of changes, or even an entire codebase, as well as mechanisms to do similar within GitHub Pull Requests. The simplest way to think about it: Warden is a framework for using LLMs to find bugs. It is licensed as Fair Source.

It does this by building on top of the Claude SDK. It wraps your skill with a meta prompt that creates a constraint on the behavior, and includes a number of other capabilities like deduplication and verification - both LLM-driven. The output ends up dual optimized for humans and machines, giving you a pleasant terminal-based interactive experience as well as a schema-bound JSONL capturing the results.

You can use it to enforce anything that you can connect to code, but we find it most effective at things like vulnerability detection or bug identification. If you can describe it with a skill, Warden can help you enforce it.

#Results Speak

If you’ve been following our work here, you might remember we found a dozen Sentry-scoped vulnerabilities with the first pass of Warden some months back. None of them were critical, but it proved that leveraging inference in addition to our traditional mechanisms was going to be valuable. Since then we have been running Warden across a number of repositories at Sentry, mostly finding small bugs here and there in Pull Requests. More recently I had been wanting to experiment with another technique that seemed to be more reliable with models: scoping the goal to a much more narrow set of conditions. That meant taking our larger reaching “security vulns” skill and downgrading it to a “this specific security vuln”. We combined this with a form of metaoptimization that looks quite a lot like RL - I’ve written about this before and I call it Synthesis - and the results are very very good.

In this pass we found 60+ vulnerabilities in Sentry primarily around auth/authz, including full authentication bypasses as well as a variety of indirect object references or otherwise wrongly scoped permissions that allowed legitimate destructive actions. They were all off the critical paths, and the impact was limited, but they were issues we would have absolutely paid bounties out for.

The cost of all of this? The skill itself usually takes $10s of dollars in token cost to build, and there’s incremental refinement from there. The major costs come from backfilling and iteration - we hit just around $1,000 with opus-4.7 running the single authz skill against our backend Python code (this is the getsentry/sentry repo).

This time around I also wanted to see what this would look like in some other projects. I’m convinced this is a big unlock for teams, and so I figured if I could show this wasn’t just Sentry it might spark some more interest. I ran this against some big open source projects that fit my criteria: highly complex and well adopted, some kind of large scoped backend service with auth, and active and old enough to almost certainly have a lot of gaps.

In Home Assistant’s core with a partial scan focused on third-party components (it’s a large repo!) we reported 52 findings. This took a little bit of refinement on the skill as I’m not familiar with the codebase, but I spot checked with maintainers and they validated they were real (albeit not always that important given what HASS does).

In Discourse we scanned the whole Ruby app and came out with a total of 7 findings after two fine-tuning passes. Discourse has been pretty eager on toolchain adoption and the surface area is smaller than HASS so it doesn’t totally surprise me that we didn’t find as much.

I won’t be sharing the details of these for obvious reasons, but Home Assistant gave us permission to share an example of the outcomes.

Overall I’m very proud of what you can accomplish using this technique, and it’s easy enough to teach so that’s what we’re here to do today.

#Building an Effective Skill

Warden will take your skill and wrap it up with some syntactic sugar and a little bit of coercion, but the power is still coming from that skill. I’ve found the more targeted you make it the better it will perform. Within the context of security this looks like using a skill per set of security concerns rather than a generalized one. So for example, a “find-rces” vs “find-security-vulns”.

To build the skill we’re going to use Sentry’s skill-writer. I have a methodology that says if something is going to change or be needed quite often, it must be done via automation or tools that are repeatable. In this case, automation is our skill-writer - a skill that just synthesizes a number of best practices and we use to simplify building and maintaining skills (including itself).

Choose a location for your skills. This can either be a global repo, in case you want these to be reusable across projects, or it can be within the project’s repo itself. I recommend putting them in the skills/ directory within that repo. I’m going to use an example of doing them in a dedicated repository.

Open your favorite coding agent and let’s kick it off:

I’m setting this folder up as a new repo which will contain skills for our warden project (warden.sentry.dev). These are going to be generalized skills (each in ./skills/skillName) which we’ll use for things like bug detection, security concerns, etc. Help me get some scaffolding, docs, AGENTS.md, etc in place. You also should use dotagents.sentry.dev and setup the skill-writer skill from the getsentry/skills repo. Make the license MIT in this repository.

You may want to clean some of the repository up at this point, but you should have a basic new project setup, you should see an agents.toml with skill-writer in it, and a skills/ directory where we’ll be working out of.

In our AGENTS.md, when we build Warden skills, we should always find examples for both Python and TypeScript projects, and use them to understand and encode patterns. These skills should also be designed to trace concerns fully. We will also prefix all of these skills with ‘wrdn-’, and they will generally use the Read Grep Glob and Bash tools.

Now you just need to kick off synthesizing a new skill. In our case we’re going to focus on authentication bypasses:

Create a security skill that’s focused on identifying authentication bypasses, permission issues, or similar high criticality concerns. Identify common frameworks and the patterns that we should be looking for. For each framework, or larger set of concerns, break them up in their own references file. Look up prior art online for best practices on identifying these concerns, what the common mistakes are. Additionally sift through the ~/src/sentry repository for similar reference material.

You may find you want to push it in a certain direction, or it needs heavier steering to get it to search online for better references. You may also find that you need to feed it a list of technologies or frameworks that matter more, or some better prior art online (there are both good and bad resources).

When it’s done, skim through the skill to make sure it looks like what you’d expect. It’s going to be a fat skill, but if it hasn’t been sufficiently segmented to where it’s loading information it may never need, then you should tell your agent to improve upon that (e.g. python concerns should be in a references file if you’re not always evaluating python code).

#Testing and Iterating Locally

To test this I find it’s easiest to install Warden system-wide:

npm install -g @sentry/warden

Pop over to your repository where you’re going to QA the skill, identify a path of critical code (I like to test on our API endpoints), and run Warden on it:

warden --skill ~/src/warden-skills/skills/wrdn-access-control "src/sentry/api/endpoints/**/*.py"

This can be both expensive and take a long time. Warden by default is going to use your Claude credentials (via the Claude SDK). You’ll see it sift through a bunch of chunks across the files - it will split up one file into some smaller chunks, similar to how a pull request would look, and you might notice findings come up. All of what’s happening here will get logged to a typed jsonl file that you can go back and revisit with warden logs. More importantly, you can feed the whole thing into your favorite coding agent, which is what we’ll do next:

I have a set of findings that I want you to verify. Use a subagent to explore each one, trace it fully, and verify the finding. The findings are available in [logfile path].

There’s a handful of ways you can use the finding data, but I find having a second pass agent go over it valuable. You’ll be able to dig more and fine-tune it, as it’s not uncommon for Warden to get the severity wrong, or to fail to entirely trace the problem. In this case you might also find that some are false positives, this is where we can iterate. Take any false positive, dump it back into your session where you generated the skill file (or a new session if you must) and ask to evaluate what should change to improve the results. Make sure it uses skill-writer again if it’s a fresh session.

Do this until you’re satisfied, but honestly I find the better the model, the less you need to micromanage the skills (no different than prompts). I almost always find the first few sessions have false positives - either classified as worse than they are, or not deeply traced enough. Just run subjective passes til you find the accuracy is good enough. It will never be 100%, but I find you can get it to 95%+ with only a few iterations. Often I stop after 1-3 findings, and then feed those results back into an active agent session to validate and use the learnings to reinforce the skill.

Just as a note, when you’re reading this post, it’s possible we’ve already shipped our broader SkillsKit (skillet at the time of writing). This is intended to make it easier to bundle up the act of building, improving, and evaluating (via evals) skills.

#Running On Pull Requests

Warden tries to solve two concerns:

Finding pre-existing issues
Preventing them from happening in the future

The latter means running it on all forward-looking code changes, and we provide a native capability to do this via GitHub Actions. If you’re not using GitHub you can certainly still instrument things, but we’re big GH users ourselves.

In this scenario you can think of Warden similarly to all of the code review bots you’re probably far too familiar with, except you control the focus. It will scan the PR diff, using a similar chunking strategy to what it does locally, and will annotate the PR with any findings. Additionally it will deduplicate its own findings, as well as avoid annotating anything that has already been identified by another bot. In practice, unfortunately, Warden usually finds things first and the other bots still flag their findings, but it’s the thought that counts right?

Setting this up is pretty simple, and we recommend doing it organization wide at your company:

Go to Organization Settings → Secrets and variables → Actions
Add the following secrets:

WARDEN_ANTHROPIC_API_KEY
Your Anthropic API key from console.anthropic.com

WARDEN_MODEL (optional)
Override the default model (e.g. claude-opus-4-6)

WARDEN_SENTRY_DSN (optional)
Sentry DSN for error and performance telemetry

Now in each repository where you want to utilize Warden it’s ready to go and all you have to do is configure it normally. You can do this with npx @sentry/warden init and it will automatically generate the baseline workflow.

We are also exploring running Warden globally via the .github repository, where we could enforce global skill checks only when certain things exist in a repo (like scanning for vulns in any changes to a GitHub Workflow).

#Fin

I hope you try Warden, and I hope you start asking “what comes next” beyond code generation. It’s not dozens of agents coordinating to generate more slop - it’s verification, detection, improving what we already can do. Every time I pick up this project to take a spin at improving it I find more active vulnerabilities in our codebases. If you take the time to try this, I am almost certain you will find them in yours too. Yes, inference is expensive, but it’s cheaper than the bounties we would have paid out.

Please be responsible. You’ll find our example skills on GitHub, such as the one that found a large number of Sentry vulnerabilities. If you use this for evil, human-enforced karma is a thing.

Give Warden a spin, tell me how it works, and if you find anything that you think could be better, drop a note on GitHub.

p.s. Mythos is FUD