Show HN: Black-box API bug detection across 7 AI systems
resources.kusho.aiKind of wild that we're finally getting benchmarks for AI-generated API testing. Feels like the equivalent of SWE-bench, but for finding actual bugs instead of writing code.
Cool launch - let me try this with our in house setup!
interesting stuff
Thank you. Would really appreciate feedback on methodology or the evaluation framework!