LDB: Large Language Model Debugger via Verifying Runtime Execution Step by Step
github.comHumanEval Benchmark: 95.1 @ GPT-3.5
I wonder if it can be combined with projects like SWE-Agent to build powerful yet opensource coding agents.
- https://paperswithcode.com/sota/code-generation-on-humaneval