mauicv/BipedalWalker-v2-ddpg

Bipedal Walker OpenAI gym Reinforcement Learning solution.

List of stupid mistakes made throughout implementing this algorithm.

Always check numpy array shapes. Specifically that you haven't broadcast a (64) dimension array over a (64, 1) dimension array! 🤦
Check every variable. Spent ages trying to figure out why nothing was being learnt only to discover instead of returning states and next_states from the memory buffer sample I was instead just returning states and states! 🤦
Copied and pasted the actor network while building the critic and accidentally forgot to remove the tanh activation meaning the critic could at most predict a total of reward 1 or -1 for the entire episode given any state and action pair! 🤦
Left the hard-coded high action bound in from training the pendulum environment as a default when initializing the actor model. Correctly adjusted it for the actor on the agent class but not the target actor meaning the target actor would always output 2 times the action the actor would! 🤦

implementation of ddpg algorithm to solve openai-gym BipedalWalker-v2 environment

Language:Jupyter Notebook 91.5%Language:Python 8.5%