Dominic Rigby

Capture the Flag

Date: 2025-06-04
arXiv Link: Need to find lin Key Points:

Introduces a multi-agent RL environment (“Capture the Flag”) with populations of agents employing both fast and slow LSTMs.
Implements a two-tier optimization: each agent learns an internal reward function while an external optimization process adjusts those internal rewards to maximize team win rate.
Demonstrates emergent cooperative strategies: agents learn to coordinate roles (offense vs. defense) by balancing internal exploration with external competitive reward.
Highlights that an agent’s “internal” objective (e.g., collecting flags efficiently) can diverge from the “external” objective (winning the match), requiring nested RL loops.

Key Methods:

Fast‐LSTM policy network: processes high‐frequency observations (e.g., immediate surroundings) for quick reflexes.
Slow‐LSTM “strategy” network: processes longer‐term context (e.g., team positions, flag locations) to set internal sub‐goals.
Two‐tier optimization:
- Internal RL loop: each agent maximizes its internal reward (e.g., proximity to enemy flag, survival time).
- External RL loop (“meta‐optimizer”): adjusts internal reward weights across the population to maximize global win rate.
Population‐based training: regularly introduce new agents, replace poorly performing ones, and evolve internal‐reward parameters.