SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Tasks arxiv.org 1 points by FiberBundle 21 hours ago · 1 comment Reader PiP Save No comments yet.