TD-MPC2: Scalable, Robust World Models for Continuous Control
Date: 25th May 2025
arXiv Link
Key Points:
- Extends TD-MPC (Temporal Difference Model Predictive Control) into TD-MPC², a multi-task RL framework using a world model for planning in latent space.
- Achieves strong performance across 104 continuous control tasks without per-task hyperparameter tuning.
- Demonstrates that a single 317M‐parameter world‐model agent can learn to solve 80+ tasks spanning diverse domains and embodiments.
- Utilizes MuZero‐like latent‐space planning that iteratively refines action sequences via learned dynamics.
Key Methods:
- World model architecture: decoder-free latent dynamics with MLPs (LayerNorm + Mish), SimNorm on latent states, and an ensemble of Q‐functions with Dropout.
- MuZero‐style planning: perform latent‐space rollouts to evaluate action sequences; encoder maps observations directly to LSTM‐based latent states.
- Shared hyperparameters for all tasks: same learning rate, network sizes, planning depth, etc., enabling out-of-the-box multi-task generalization.
- Open-source code and benchmarks: publicly available repository for reproducibility.