We reproduced DeepSeek R1-Zero in the CountDown game, and it just works Through RL, the 3B base LM develops self-verification and search abilities all on its own You can experience the Ahah moment yourself for < $30 Code: https://t.co/B2IsN1PrXV Here's what we learned 🧵 https://t.co/43BVYMmS8X

1 min read Original article ↗

Post

Don't miss what's happening

People on X are the first to know.

Log inSign up