Settings

Theme

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

arxiv.org

6 points by mauriziocalo 9 months ago · 1 comment

Reader

galaxyLogic 9 months ago

> Our results reveal that all tested models struggled significantly, achieving less than 5% on average

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection