Settings

Theme

GPT-5.2 got worse on Terminal Bench 2.0, so is GPT-5.2 Pro

twitter.com

1 points by xdotli 2 months ago · 1 comment

Reader

xdotliOP 2 months ago

tldr: - gpt-5.2 and gpt-5.1-codex-max have identical pass rates but solve different tasks - 36 tasks common to both - 12 tasks unique to each model - gpt-5.2-pro consistently underperforms by ~7-9 percentage points - gpt-5.2-pro has significantly more timeout issues (26 vs 7-8) - Extended timeouts recover additional passes - using 3x timeout multiplier recovers ~5-7 passes per model

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection