pemami4911 / deep-rl

Collection of Deep Reinforcement Learning algorithms

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible regression to do with batch normalization

JanMatas opened this issue · comments

Hi,

I just tried your code for the first time and I was disappointed to see that even after 500+ episodes, the rewards for Pendulum env were still in <-1000 area. I poked around a little and after reverting the latest commit (f242533) the algorithm works as expected and achieves good results after around 100 episodes. It seems like the commit above was a regression.

Oops thanks. I commented it out by default, the master branch has been updated.