HRM-Agent: Training a Recurrent Reasoning Model in Dynamic Environment using Reinforcement Learning
Date read: 2nd November 2025
Paper link
Key Points
- Creates Hierarchical Reasoning Model for RL: HRM-Agent
- Reasoning in RL is planning
- Why?
- LLMs and model-based RL often suffer not dynamically changing the amount of compute used
- HRM learns to choose how much compute it should use
- Multi-level reasoning allows for multi-level planning
- They reuse the latent representations from last step to reuse previous processing and promote consistency
- Tests it on path planning