Replay debugger for AI agents (fix failures without rerunning everything)

1 points by whitepaper27 4 months ago · 1 comment

Reader

Hi HN,

I recently spent ~6 hours debugging a multi-agent workflow.

Every time I fixed a bug, I had to re-run the entire pipeline — repeating LLM calls, hitting APIs again, and waiting minutes just to test one change.

So I built Flight Recorder.

It records agent execution, shows exactly what failed, and lets you replay from that point — skipping everything that already worked.

Demo (30 sec): https://github.com/whitepaper27/Flight-Recorder/blob/main/de...

Example:

    @fr.register("crm_lookup")
    def crm_lookup(email):
        return db.query(email)  # slow API call

    @fr.register("scorer") 
    def score_lead(contact):
        assert contact is not None  # bug here

    fr.run(pipeline, "unknown@example.com")

    # later:
    $ flight-recorder debug last
    root cause: crm_lookup returned None

    # fix the bug, then:
    $ flight-recorder replay last
    # crm_lookup is cached (not re-run)
    # scorer runs with your fix
    # SUCCESS in seconds

The key idea: Fix one step. Don’t rerun everything.

It’s open source (MIT) and still early — I’d really appreciate feedback from anyone building agent workflows.

GitHub: https://github.com/whitepaper27/Flight-Recorder

Settings

Replay debugger for AI agents (fix failures without rerunning everything)

Keyboard Shortcuts