Illustrated Comparison of Different Distributed Versions of PPO
Date: 7th July 2025
Blog post
Key Points
- Discussed a variety of different ways to distributed PPO.
- This has the challenge that the lag can make the data off-policy
- Discussed themes:
- Synchronous: waits for all agents to calculate their respective gradients before doing a weights update.
- Asynchronous: doesn’t wait.
- Centralised: single server does all gradient accumulation and weights updates.
- Decentralised: all share gradients (all-reduce) but have their own model.
- Discussed algos:
- Asynchronous PPO: multiple CPU workers collect rollouts to be set to GPU for gradient calculation. (not asynchronous w.r.t gradients)
- Distributed PPO: mutliple actors, single parameter server. They send rollouts and the server calculates gradients and then applies weight updates.
- Decentralised Distributed PPO: no parameter server, decentralised updates.
- Resource Flexible Distributed PPO: workers can either be used for rollout collection of gradient calculation.