SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Task arxiv.org 1 points by mohsen1 a month ago · 0 comments Reader PiP Save No comments yet.