Settings

Theme

Show HN: Proof Loop – I make my coding agents prove they finished the task

github.com

2 points by LeoStehlik a month ago · 3 comments · 2 min read

Reader

I built this because my coding agent kept telling me he did complete the task, but when I verified it, it was not the case.

I made Proof Loop fairly light, intentionally. It’s basically a protocol helper script for AI agent tasks:

- set acceptance criteria before coding/implementation - keep the builder and verifier roles separate - each criteria tested with results PASS, FAIL or UNKNOWN - attach evidence of done - keep the proof evidence in the repo, so that the next agent / run can inspect it and see what was already done

You can try it via commandline from the cloned repo, go the the proof-loop directory and run make demo.

Teh demo creates a task, checks the proof bundle, fails if evidence is missing, then passes when acceptance criteria have evidence attached.

There is also an OpenClaw skill version now, so the easiest use is

openclaw skills install proof-loop

In the GitHub repo, there is harness-agnostic version and examples.

I would especially like criticism and/or any feedback from people who run Codex, Claude Code or OpenCode on long-running multi-step tasks.

Note this is a utility that I use myself, FoC, MIT Licensed, OpenSourced, with no intention of any commercialization.

crionuke a month ago

For me unit or even int tests are more reliable signs to get agent is done or still not, and you ?

  • LeoStehlikOP a month ago

    same, but that follows. Why I wanted a proof first is so that I don’t waste time running tests on code that was far from finished yet. Especially early days this year, I’d get agent confirming to me “I did this” whilst later I uncovered it struggled to use tools, so it just said it was done. When I recieve the evidence of “I’ve done it” (iterate if anything is missing), only then I trigger the round of unit tests. I know this may sound like a bit of too much careful handholding, but got burned so many times this pays off.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection