Settings

Theme

Benchmarking GPT-5

coderabbit.ai

9 points by aravindputrevu 6 months ago · 1 comment

Reader

aravindputrevuOP 6 months ago

We put GPT-5 through our Golden PR Dataset.

Here is the TL;DR

- GPT-5 outperformed Opus-4, Sonnet-4, and OpenAI’s O3 across a battery of 300 varying difficulty, error-diverse pull requests.

- GPT-5 scored highest on our comprehensive test and found 254 out of 300 bugs or 85% where other models found between 200 and 207 – 16% to 22% less.

- On our 25 hardest PRs from our evaluation dataset, GPT-5 achieved the highest ever overall pass rate (77.3%), representing a 190% improvement over Sonnet-4, 132% over Opus-4, and 76% over O3.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection