First AI Benchmark Solved Before Release: The Zero Barrier Has Been Crossed
h-matched.vercel.appAuthor here! While working on h-matched.com (tracking time between benchmark release and AI achieving human-level performance), I just added the first negative datapoint - LongBench v2 was solved 22 days before its public release.
This wasn't entirely unexpected given the trend, but it raises fascinating questions about what happens next. The trend line approaching y=0 has been discussed before, but now we're in uncharted territory.
Mathematically, we can make some interesting observations about where this could go: 1. It won't flatten at zero (we've already crossed that) 2. It's unlikely to accelerate downward indefinitely (that would imply increasingly trivial benchmarks) 3. It cannot cross y=-x (that would mean benchmarks being solved before they're even conceived)
My hypothesis is that we'll see convergence toward y=-x as an asymptote. I'll be honest - I'm not entirely sure what a world operating at that boundary would even look like. Maybe others here have insights into what existence at that mathematical boundary would mean in practical terms?
Since model capabilities do not change after release, shouldn't the model release date be the benchmark? (In this case, o1-preview was released on September 12, 2024)
You could flip it around like that. In this case I have chosen to have the "Released Date" as being when the benchmark was published and the "Solved Date" to be when an AI system had a human-level performance for that specific benchmark.