SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Tasks arxiv.org 2 points by FiberBundle 2 months ago · 1 comment Reader PiP Save No comments yet.