Possible regression to do with batch normalization
JanMatas opened this issue · comments
Jan Matas commented
Hi,
I just tried your code for the first time and I was disappointed to see that even after 500+ episodes, the rewards for Pendulum env were still in <-1000 area. I poked around a little and after reverting the latest commit (f242533) the algorithm works as expected and achieves good results after around 100 episodes. It seems like the commit above was a regression.
Patrick Emami commented
Oops thanks. I commented it out by default, the master branch has been updated.