Dominic Rigby

HRM-Agent: Training a Recurrent Reasoning Model in Dynamic Environment using Reinforcement Learning

Date read: 2nd November 2025

Creates Hierarchical Reasoning Model for RL: HRM-Agent
Reasoning in RL is planning
Why?
- LLMs and model-based RL often suffer not dynamically changing the amount of compute used
- HRM learns to choose how much compute it should use
- Multi-level reasoning allows for multi-level planning
They reuse the latent representations from last step to reuse previous processing and promote consistency
Tests it on path planning