New research shows RL may not help a model learn new basic skills

1 points by binsquare 19 days ago · 1 comment

Reader

Reinforcement learning dominated the recent neurips papers. But here's one that stood out to me about how exactly pre training can affect post training.

This means if the core data (ex. additions, subtractions, etc) were not there in the pre training stage, RL on complex math problems would not lead to the model developing improvements in the core areas.

Settings

New research shows RL may not help a model learn new basic skills

Keyboard Shortcuts