mauicv / BipedalWalker-v2-ddpg

implementation of ddpg algorithm to solve openai-gym BipedalWalker-v2 environment

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bipedal Walker OpenAI gym Reinforcement Learning solution.

bipedal walker solution


Resources:


Codebases:

Papers:

Posts:

Lessons:

List of stupid mistakes made throughout implementing this algorithm.

  1. Always check numpy array shapes. Specifically that you haven't broadcast a (64) dimension array over a (64, 1) dimension array! 🤦
  2. Check every variable. Spent ages trying to figure out why nothing was being learnt only to discover instead of returning states and next_states from the memory buffer sample I was instead just returning states and states! 🤦
  3. Copied and pasted the actor network while building the critic and accidentally forgot to remove the tanh activation meaning the critic could at most predict a total of reward 1 or -1 for the entire episode given any state and action pair! 🤦
  4. Left the hard-coded high action bound in from training the pendulum environment as a default when initializing the actor model. Correctly adjusted it for the actor on the agent class but not the target actor meaning the target actor would always output 2 times the action the actor would! 🤦

About

implementation of ddpg algorithm to solve openai-gym BipedalWalker-v2 environment


Languages

Language:Jupyter Notebook 91.5%Language:Python 8.5%