Ask HN: How are people testing while using agent orchestrators?

1 points by spmartin823 3 months ago · 0 comments · 2 min read

I'm using Conductor and overall it's been a game changer for my productivity. The one hiccup is that their "Spotlight" feature, which is supposed to sync the worktree with my root and thus make testing locally possible, doesn't work reliably. Even if it did, it wouldn't be exactly what I need because I want each workstream to be able to test independently.

Three things I've tried so far, none of which are working well:

1. I used a Conductor setup script that runs my local dev setup in each worktree. This didn't work because of port collisions between docker containers.

2. I'm using terraform, so it was trivial to spin up a copy of my staging infra (with fewer resources) for every PR. This let each claude session in Conductor use Playright to test it's code. Two problems: first, this is pretty expensive ($2-5/per day/per pr). I'm pushing 20-30 prs a day, so this was costing me $XXX/month even with automated cleanups. Second, my deploy takes about 10-15 minutes, which isn't that long, but claude would often need to be re-prompted to check on the deployed changes.

3. For new features, I just had Claude yolo code to staging or prod behind feature flags. This caused regressions and requires that Claude have access to privileged data for testing, so not a great solution.

I'm thinking that something like local VMs tied to each worktree could make sense, but wanted to check if I'm just oblivious to an existing solution before diving into that.

No comments yet.

Settings

Ask HN: How are people testing while using agent orchestrators?

Keyboard Shortcuts