Alphaproof paper (IMO 2024 Silver) is finally published in Nature [pdf]

2 points by zuzatm 7 months ago · 1 comment

Reader

zuzatmOP 7 months ago

One notable difference from what one would expect from a LLM-RL paper is the use of test-time RL. I guess when you have a very strong verification, you can specialize your network to solve only your problem. Curious if this can be also be applied in natural language reasoning.

Settings

Alphaproof paper (IMO 2024 Silver) is finally published in Nature [pdf]

Keyboard Shortcuts