lit: Version control with good vibes

February 15, 2026

Like everyone else with a keyboard and an anxious interest in their future as a programmer, I’ve been using AI agents to write a lot of code recently. You talk to Claude or Cursor and code appears and it mostly works, you fiddle with it a bit to get it right, and you push it. I have mixed feelings about the results, about what it means for the future, and most importantly for how it makes me feel, but this post isn’t yet another one in that vein.

Using agents means things are getting faster and crazier on the code generation side, but on the code review side, the bottlenecks preventing this generated code from actually making it to production, seem to be largely intact. Even though everyone (including me last year) seems to like the analogy of LLMs being to compiled programming languages what compiled languages were to assembly code, there isn’t any tooling in place to actually make this happen.

People are not accountable for the code they generate. Their intent when generating the code is not recorded, and is lost in the vibes. It’s a particular shame because the natural language prompts encode this intent quite well: well enough for the LLM to combine it with its training set and produce the code for you.

So I had the idea for a tool, or maybe it’s better to think of it as a “thought experiment” because the workflows are still not entirely clear to me yet. It’s heavily inspired by git, and it tries to make this “the prompts are the source of truth, the code is just an artifact” thing a reality. Whether I actually want this reality, or whether it just bringing us closer to an apocalypse of meaning, is a different question altogether. But it was fun to build.

lit

lit is a version control system that treats LLM agent prompts as the source of truth for software projects. Generated code lives in a code.lock/ directory (“lockdir”) and is committed alongside prompts in git. The code generation itself is also handled by lit.

The name is a working title. It’s meant to simultaneously evoke git (its spiritual predecessor and storage layer), the word “literature” (natural language as the source of truth), and just generally “vibes”, because… you know. Unfortunately it’s also the name of a well-known web framework with 21k GitHub stars, so it’ll probably have to change.

GitHub | Spec | Demo

Here’s what a prompt looks like in lit:

markdown

 1---
 2outputs:
 3  - src/models/user.py
 4imports:
 5  - prompts/models/base.prompt.md
 6---
 7
 8# User Model
 9
10Create a SQLAlchemy model for a User with fields:
11- `email`: String(255), unique, indexed, not null
12- `hashed_password`: String(255), not null
13- `full_name`: String(255), nullable
14- `is_active`: Boolean, default True
15
16Use the Base class from @import(prompts/models/base.prompt.md).
17
18Include proper `__repr__` and a relationship to items.

The frontmatter at the top declares what files the prompt generates and what other prompts it depends on. That @import() is the key idea: lit builds a dependency DAG from the imports and generates code according to this graph: when prompt B imports prompt A, lit generates A first, then feeds A’s generated code as context when generating B.

Here’s what the structure of a lit repo looks like:

my-project/
  lit.toml                          # Config: language, framework, model
  prompts/
    models/user.prompt.md           # Source of truth
    models/base.prompt.md
    api/users.prompt.md
  code.lock/
    src/models/user.py              # Generated artifact
    src/models/base.py
    src/api/users.py

Why `code.lock/`?

The key insight, if there is one, is that LLM-generated code has the same problem as dependency resolution: the output is non-deterministic and expensive to produce, so you want to pin it. So code.lock/ is generated and committed to git as an artifact. It’s a “lockdir” containing your entire codebase, and you are encouraged not to look at it, the same way you never really open your package-lock.json file or uv.lock file, and it’s not such a big deal if it gets deleted (except here regenerating it will cost actual money in the form of LLM tokens).

Using `lit`

I don’t think people are going to be editing .prompt.md files in an orderly fashion instead of iterating live with an agent. Rather, lit is for what comes after that, when code needs to be maintained by a team, reviewed, and understood months later.

There are three workflows where I think this matters.

Post-hoc formalization of vibecoding. Vibecode something freely, and once it works, write the prompt that describes the intent: what this code should do, what assumptions it makes, what “contract” it fulfills. Then run lit regenerate to verify the prompt actually reproduces the code. This command actually handles the generation of the code (integration with LLMs). Now you have a reproducible (insofar as LLMs can be) spec committed alongside the code.

This feels to me sort of like writing tests after prototyping something.

# After vibe-coding a feature that works:
vi prompts/auth/login.prompt.md      # Describe the intent...
lit regenerate                       # Verify it reproduces. 
lit diff --code                      # Compare generated vs hand-written
lit commit -m "Capture login intent"

Prompt-driven changes. Requirements change. Instead of asking an AI to “update this code” and hoping it gets it right, you change the prompt - the spec - and regenerate. The diff shows the change in intent, not just the change in code. Code review becomes review of requirements.

This is where the DAG becomes really useful. I added two lines to a user model prompt in the demo project and ran lit diff --summary:

=== Changes Summary ===
  Prompts:
      ~ prompts/models/user.prompt.md  (+2 -0 lines)

  Impact (prompts that will regenerate):
    -> prompts/models/user.prompt.md
    -> prompts/models/item.prompt.md       (imports user models)
    -> prompts/schemas/user.prompt.md      (imports user models)
    -> prompts/api/users.prompt.md         (imports user models, user schemas)
    -> prompts/schemas/item.prompt.md      (imports item models, user schemas)
    -> prompts/tests/test_users.prompt.md  (imports user schemas)
    -> prompts/api/items.prompt.md         (imports item models, user models, item schemas)
    -> prompts/tests/test_items.prompt.md  (imports item schemas, user schemas)

  8 prompt(s) will regenerate, 4 unchanged

Prompts as documentation. A new developer reads prompts/ to understand intent, not just implementation. Each prompt file is a spec for what its generated code should do. The DAG shows how components relate.

lit debug dag                       # See how prompts depend on each other
cat prompts/api/users.prompt.md     # Read the spec for the users endpoint

The demo

I built a complete CRUD API app using lit to demonstrate the idea. Twelve prompts generate the FastAPI application: models, schemas, API modules, test suites, database config, and package structure. Each prompt is 15–30 lines of natural language. The generated code is a working API with proper relationships, validation, pagination, soft deletes, and test coverage.

The entire application is reproducible. Clone the repo, set your API key, run lit regenerate, and you will get the same app.

Under the hood

When you run lit commit -m "message", it parses all the prompt files, builds the dependency DAG, detects which prompts changed since the last commit, computes the regeneration set (changed prompts plus their downstream dependents), and then generates according to the graph. Each prompt gets assembled into an LLM request with the project config, the generated code of its imports, and the prompt body. The LLM responds with file delimiters and lit parses the output into code.lock/.

There’s input-hash caching a la Bazel (SHA-256 of prompt content + imported code + config) so unchanged prompts are skipped. There is also basic manual patch support, because sometimes you need a one-line fix and regenerating and burning tokens is overkill. lit saves your edit as a patch and reapplies it on top of future generations.

Generation metadata is committed alongside everything else: tokens used, cost, model, a snapshot of the DAG at generation time. You can run lit cost --breakdown and see exactly what you’re spending per prompt:

=== Cost Summary ===
  Total: $0.42 across 3 commits

  Per-prompt breakdown:
    prompts/api/users.prompt.md      $0.08  (2,431 tokens)
    prompts/api/items.prompt.md      $0.07  (2,198 tokens)
    ...

Try it

bash

1git clone https://github.com/clintonboys/lit
2cd lit
3cargo install --path .
4
5export LIT_API_KEY=sk-ant-...
6mkdir my-project && cd my-project
7lit init --defaults

Or look at the demo: lit-demo-crud.

The full spec is in SPEC.md — nine sections covering the storage architecture, generation pipeline, DAG resolution, cost tracking, and design rationale.

It’s about 7,600 lines of Rust with 128 tests, and was itself largely vibe-coded with Claude. I did write the README and this blog post by hand though.

Limitiations of v1

This is a proof-of-concept that I put together in a couple of days. It works and it’s interesting but it has a few major limitations, the main one being that currently every prompt must explicitly declare the files it will generate in its YAML frontmatter:

outputs:
  - src/models/user.py
  - tests/test_user.py

This means the prompt author needs to know the output file paths before the LLM generates anything. It works well when you’re formalizing existing code or when you have a clear project structure in mind and are OK with prompts and the files they generate being in 1-1 correspondence. But it is a lot more rigid than how most people actually use agents to write software.

This is a deliberate trade-off to get a prototype working. Declaring outputs up front is what makes the rest of lit work cleanly: the DAG can be resolved before generation, caching can be content-addressed, and no two prompts can accidentally claim the same file.

An obvious way to address this is “two-shot generation”: a cheap first pass asks the LLM “what files would you produce for this prompt?”, and lit records the answer as a “manifest”, which is also pinned in the repo. The second pass does the actual generation against that manifest. This preserves all the benefits of knowing outputs ahead of time (DAG resolution, caching, conflict detection) while removing the burden from the prompt author.

For a tool like this to be actually used in production for large-scale code bases, maybe it would need to be aware of the AST of the code base in some way? I think that could be an interesting, but obviously much more involved, direction.

Please don’t hesitate to get in touch if you are interested in lit or you want to discuss the ideas here further: I’d love to hear from you!

lit