New research shows RL may not help a model learn new basic skills
arxiv.orgReinforcement learning dominated the recent neurips papers. But here's one that stood out to me about how exactly pre training can affect post training.
This means if the core data (ex. additions, subtractions, etc) were not there in the pre training stage, RL on complex math problems would not lead to the model developing improvements in the core areas.