GPT-5.2 got worse on Terminal Bench 2.0, so is GPT-5.2 Pro

1 points by xdotli 5 months ago · 1 comment

Reader

xdotliOP 5 months ago

tldr: - gpt-5.2 and gpt-5.1-codex-max have identical pass rates but solve different tasks - 36 tasks common to both - 12 tasks unique to each model - gpt-5.2-pro consistently underperforms by ~7-9 percentage points - gpt-5.2-pro has significantly more timeout issues (26 vs 7-8) - Extended timeouts recover additional passes - using 3x timeout multiplier recovers ~5-7 passes per model

Settings

GPT-5.2 got worse on Terminal Bench 2.0, so is GPT-5.2 Pro

Keyboard Shortcuts