Settings

Theme

MathNet:30k competition math problems for AI mathematical reasoning benchmarking

mathnet.mit.edu

5 points by nill0 14 hours ago · 2 comments

Reader

nill0OP 14 hours ago

Relevant article:

https://news.mit.edu/2026/mit-scientists-build-worlds-larges...

LeCompteSftware 7 hours ago

Hmm I already found a typo in one of the solutions. I believe this scraped from a bunch of PDFs in an unaudited automated process, so of course there are going to be some problems. But

a) It doesn't bode well that I poked at three problems and already found an issue.

b) Even if it took 50 problems before my sampling paid off, there are 30,000 things to review here. I am not sure anyone actually took responsibility for even reading it, let alone making sure it was correct.

I am getting tired doing basic sanity-checking on this stuff. Maybe I just got extremely unlucky and found one of the 300 problems with a typo. But I have been feeling awfully dejected at seeing so much garbage vibe code this year, and am not feeling particularly charitable to this. If volunteer QA can find a problem with 5 minutes of not particularly close reading, then it doesn't seem like this is ready for release.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection