Settings

Theme

Alphaproof paper (IMO 2024 Silver) is finally published in Nature [pdf]

nature.com

2 points by zuzatm a month ago · 1 comment

Reader

zuzatmOP a month ago

One notable difference from what one would expect from a LLM-RL paper is the use of test-time RL. I guess when you have a very strong verification, you can specialize your network to solve only your problem. Curious if this can be also be applied in natural language reasoning.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection