Dominic Rigby
Reinforcement Learning Pre-Training
Date: 11th June 2025
arXiv link
Key Points
Allow the model to reason when predicting the next token during pre-training.
This essentially turns all text data into RL environments
Can also be used for fine-tuning