Addresses multi‐agent reinforcement learning (MARL) in environments requiring high communication by centralizing decision‐making via a joint policy.
Decomposes a joint action distribution into a sequence of conditional decisions, each conditioned on prior agents’ choices.
Uses a Transformer decoder to model the sequence of agent‐decisions, effectively learning “who decides when” in a centralized yet factorized manner.
Decision order (which agent acts first, second, etc.) is dynamic and learned via a Graph Neural Network (GNN) that predicts optimal ordering based on state.