dai-dao / PPO-Gluon

Implementation of PPO in Gluon / MXnet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

  1. Add entropy term to encourage exploration

  2. GAE

  3. Distributional

  4. Other environments

  5. Bigger -> SLower nets

  6. The exploration noise causes NAN gradients, thus NAN outputs

  7. Need experience replay because it's OBVIOUSLY forgetting stuff from the past.

  8. Use OpenAI examples

  9. Combine 2 nets into one -> Works -> Learns a bit slower I think

  10. Tuned hyper-parameters, specifically the size of roll-outs, number of updates and batch size

  11. Next step -> Try GAE estimation

  12. After -> Train in distributed setting with harder environments

  13. Compare to OpenAI baseline

  14. Incorporate into StarCraft

About

Implementation of PPO in Gluon / MXnet


Languages

Language:Python 51.8%Language:Jupyter Notebook 48.2%