How do you catch AI agent regressions after prompt or model changes?
Seeing a pattern where teams fix a failure in an agent, change the prompt or model a week later, and the same failure quietly comes back. Nobody catches it until a user does.
Curious how people are handling this today. Manual test cases? Evals? Logs? Nothing?
Not trying to pitch anything. Just trying to understand how widespread this is and what current approaches look like. I'm handling this through a combination of Tests and documentation-driven development ("scribe coding" instead of vibe coding). * If it's important, it gets written into documentation somewhere. Functional requirements, technical requirements, ADRs, Lessons Learned, etc. * Code comments and docstrings point back to documentation. Especially for bug fixes. * Finally, bug fixes usually get new Unit Tests. The tests make sure if the bug resurfaces, it gets caught immediately. I absolutely believe what you're describing: I've seen heard other people talk about this. I just don't experience it myself (I'd like to hope because of the extra steps I'm taking)