Dominic Rigby

Reinforcement Learning Pre-Training

Date: 11th June 2025

Key Points

Allow the model to reason when predicting the next token during pre-training.
This essentially turns all text data into RL environments
Can also be used for fine-tuning