Dominic Rigby

On the Design of KL-Regularised Policy Gradient Algorithms for LLM Reasoning

Date read: 30th September 2025

ArXiv Link

Key Points