Settings

Theme

First AI Benchmark Solved Before Release: The Zero Barrier Has Been Crossed

h-matched.vercel.app

2 points by mrconter11 a year ago · 3 comments

Reader

mrconter11OP a year ago

Author here! While working on h-matched.com (tracking time between benchmark release and AI achieving human-level performance), I just added the first negative datapoint - LongBench v2 was solved 22 days before its public release.

This wasn't entirely unexpected given the trend, but it raises fascinating questions about what happens next. The trend line approaching y=0 has been discussed before, but now we're in uncharted territory.

Mathematically, we can make some interesting observations about where this could go: 1. It won't flatten at zero (we've already crossed that) 2. It's unlikely to accelerate downward indefinitely (that would imply increasingly trivial benchmarks) 3. It cannot cross y=-x (that would mean benchmarks being solved before they're even conceived)

My hypothesis is that we'll see convergence toward y=-x as an asymptote. I'll be honest - I'm not entirely sure what a world operating at that boundary would even look like. Maybe others here have insights into what existence at that mathematical boundary would mean in practical terms?

  • qrios a year ago

    Since model capabilities do not change after release, shouldn't the model release date be the benchmark? (In this case, o1-preview was released on September 12, 2024)

    • mrconter11OP a year ago

      You could flip it around like that. In this case I have chosen to have the "Released Date" as being when the benchmark was published and the "Solved Date" to be when an AI system had a human-level performance for that specific benchmark.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection