Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement Learning
Date: 15th July 2025
arXiv link
Key Points
- Aims for quality diversity, like MAP-Elites. QD aims to provide diverse range of solutions across a descriptor space which is high level description of a policy.
- QD algos such as MAP-Elites tend use evolutionary algorithms which don’t expand to high dimensions well. This paper aims to unite QD and reinforcement learning.
- Their algorithm DCRL-MAP-Elites uses a descriptor conditioned actor and critic. The policy and target have descriptors. I.e. it finds different ways to solve different goals. The policy is conditioned to meet a certain target.
- Descriptor are user-defined desribe the observed trajectories of policies and allow both actor and critic to become learn different niches for certain areas of the search
- Still maintains a population of policies in which variation is applied in three following ways:
- Genetic algorithm mutation
- Descriptor-condition policy gradients
- Actor injection: they train a generative model to create new policies
- Uses usual MAP-Elites algorithm to ensure that we get the best policy per area of the search space.