Settings

Theme

DeepSWE Measuring frontier coding agents

deepswe.datacurve.ai

2 points by e2e4 a day ago · 1 comment

Reader

e2e4OP a day ago

gpt-5.5xhigh leading benchmark, coincides with my recent experience. I've been opus 4.7 user but it burns tokens so quickly, so gave gpt-5.5xhigh (via codex) a try, quality was similar (if not better), and tokens lasted a lot longer.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection