Settings

Theme

FrontierScience Benchmark by OpenAI

openai.com

17 points by mustaphah 9 days ago · 1 comment

Reader

ursAxZA 7 days ago

If a model eventually scores perfectly on every benchmark yet ends up practically useless, what’s the next step?

Benchmarks measure competence inside a predefined problem space, but real scientific and engineering work isn’t bounded — it keeps changing underneath you.

At some point we don’t just need a system that knows how to solve problems in theory; we need one that can actually do something with that ability.

The equivalent of making the coffee when we want coffee, not just getting a perfect score on a coffee-theory exam.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection