OpenAI Models Dominate Structured Code Edit Benchmark
blog.mentat.aiI saw the same thing with Mailogy.
I only tested variants of gpt-3.5 and -4 but got ~50% invalid syntax errors with 3.5, and virtually none with 4.
I saw the same thing with Mailogy.
I only tested variants of gpt-3.5 and -4 but got ~50% invalid syntax errors with 3.5, and virtually none with 4.