Your coding agent, on a leash
An opinionated, full stack framework & toolkit for high quality software development using Claude Code. Architectural constraints, automated code quality checking & guidance for fixing issues. Helps Claude Code generate high quality code. NOT an AI agent orchestrator!
BEGIN→write tests→watch tests fail→red→write code so tests pass→watch tests pass→green→REPEAT
Coding agent off leash:

With a leash:

I'm Chris. I've been a software engineer for over 20 years.
Claude Code writes all code for my product, ApprovIQ - with no em-dashes!
Agents go off the rails. They get distracted. Asking nicely with AGENTS.md and system prompts and interrupting their work to remind them of stuff... yeah, it works... sort of... most of the time. That's a problem, because, "most of the time" doesn't cut it for building serious software.
CodeLeash puts guardrails outside the agent, so it can't go off the rails, forget them or ignore them. While everyone and their dog is building the tallest tower of agent orchestration they can, I've been tightly controlling outcomes in ApprovIQ's codebase, by sharply defining what quality code looks like, rubbing Claude Code's nose in it with automation, and forcing it to fix its mistakes - all while I am off doing something else. CodeLeash enforces quality with code based hooks, scripts, tools, and loops, in a way the agent can't ignore or bypass.
Look at the images. Notice: "Can't walk on the road" is NOT in a speech bubble! A leash means never barking orders.
Agents today cannot do everything you ask while also doing their task. Create guardrails that automatically show how they failed to live up to your standards while guiding them to correct their mistakes. Hence the analogy of a leash.
01
Enforcement of Test-Driven Development
Test Driven Development is a universal way to build software. Write failing tests, watch them fail, write code to make them pass, watch them pass. You're gradually building a repository of every decision you ever made. Even better, if a decision is un-made, tests fail. Alarms go off.
Forcing the agent through TDD created a repository of all my micro decisions - I stopped needing to repeat past decisions to the agent. In time I began to be supervising the TDD process itself, while the agent built software according to its plan. I had gotten myself out of the loop, removing a layer of tedium.
But babysitting a TDD process is almost as tedious as doing TDD! I was constantly stopping the agent - don't do that, you didn't see the tests pass, roll that back, it's not time to write code yet. The agent was frequently befuddled by this. So I asked: can I get myself out of that loop too?
The solution Claude and I hit on is a state machine tracked in a local log file. You can see the state machine at the top of the page!
With CodeLeash, the agent MUST use TDD. A state machine enforces the Red-Green-Refactor cycle and blocks file edits until tests fail first. No shortcuts, no skipping ahead.
A test suite in which the agent has seen every test fail - then made each test pass by writing code - helps prevent regressions and keeps development velocity high as your product grows. Per-agent isolation lets agents run in parallel.
Learn more →
02
Code Quality Checks
CodeLeash checks code quality using many small scripts, not AI. No AI means zero tokens burned. No false positives and no inconsistency. They're deterministic so they're impossible to fool. And fast so you can run them every time you change code.
Check scripts exit 0 on success or non-zero on failure. Failures print instructions for the agent to tell it how and where to fix. Failure blocks the agent, and that forces it to fix breakage right away.
Checks run before every commit, blocking problems from ever reaching your repository.
CodeLeash is full of examples for your coding agent to crib from. Some traverse the codebase with ASTs; others with regexes. A surprising amount of the code review feedback you've ever given in your software engineering career can be automated - ask your agent for ideas! With a big enough library of checks - built by you - once your agent stops working, all the basic issues were removed in response to the checks failing, without you watching. You no longer need to see agent code with obvious flaws.
Never repeat obvious fixes to your agent again.
Learn more →
03
Self-Reflection System
Agents forget everything between sessions. Compaction destroys knowledge and sessions end. Hard-won insights vanish. The same mistakes are made again and again.
CodeLeash hooks force your agent to capture learnings in files before context is lost. When a session ends or context compacts, the agent writes structured notes about surprises, workflow friction, TDD discipline issues...
Learnings can be used to make permanent improvements to your codebase. A /learnings command completes the loop -- integrating the best insights back into the codebase as permanent improvements, and deleting the learnings files.
Learn more →
04
Strict Test Timeouts
Each unit test must complete within 10ms. Forces tests to be pure business logic - no I/O, no testing third-party code.
This makes it hard for tests to interfere which allows parallel test runs. If a test touches the network or spins up a server, it fails.
Auto-retry handles transient performance hiccups like JIT warmup or heavy import chains. When a test times out on retry, a flamegraph SVG is generated automatically so you can debug. And I was surprised to learn: agents can fix code based on a flamegraph!
Your agent can test your entire product in under a minute. Let it move fast and break things - then fix them.
Learn more →
05
Worktree Isolation: Work On Many Features At Once
Parallel development in git worktrees with full isolation between them. Each worktree gets its own ports for everything.
Work on many features at once without conflicts. Spin up a new worktree, run ./init.sh && npm run dev, and enjoy a completely independent development environment with no port collisions.
Learn more →
06
Full-stack & Built To All Work Together
One repo has all your backend and frontend code. Simple data type sharing between pydantic and TypeScript.
One server (uvicorn) serves your app in production. React assets are built with Vite, minified, and served statically.
Your agent can atomically commit backend and frontend changes in one commit. No more coordination across repos, no deploy ordering issues, and no IDLs. Just JSON based REST APIs with shared data types between backend and frontend. Bliss!
Learn more →
Get the repo to see the
hello world demo.
init.sh installs dependencies, starts Supabase, configures your environment, and installs the pre-commit hook. Then dev starts Vite and FastAPI with hot reload.
$ git clone https://github.com/cadamsdotcom/CodeLeash
$ cd CodeLeash
$ ./init.sh && npm run dev
Database
SupabasePostgreSQL
Frontend
ReactTypeScriptViteTailwind CSS
Testing
pytestVitestPlaywright
Observability
PrometheusOpenTelemetrySentry
LanguageFrameworkPlatform
Why
guardrails?
Agents need constraints, not freedom.
An unconstrained agent skips tests, makes sweeping changes, and produces code that works in isolation but breaks in context. The TDD guard exists because freedom doesn't scale.
Tests are the specs and the documentation.
Every test has a descriptive name. Test failures are the only time agents need to know what is in the test suite - no context pollution except when there's something to fix. And tests' contents tell your agent how things were meant to work.
Advanced linting you create, in code.
Write a Python script that walks your codebase's AST or checks code with regex. In fact, write lots of them. Go wild, because everything you codify reduces time spent explaining things to amnesiac agents and takes you out of one more loop. Every script means less things to tell your agent, more of your time spent sipping coffee, and better results by the time it stops working.
Monorepo for maximum productivity.
Backend, frontend, database, migrations, lint rules, tests, CI, scripts and tools all happily coexisting in one repo. Changes that cross boundaries are normal, not exceptional. Plenty of documentation helps your agent work effectively with the patterns. Technologies such as Vite and patterns such as initial_data, carefully chosen for a fast, static frontend which is served with all data needed to hydrate it.