Possible regression to do with batch normalization

Question

Possible regression to do with batch normalization

JanMatas opened this issue 6 years ago · comments

Hi,

I just tried your code for the first time and I was disappointed to see that even after 500+ episodes, the rewards for Pendulum env were still in <-1000 area. I poked around a little and after reverting the latest commit (f242533) the algorithm works as expected and achieves good results after around 100 episodes. It seems like the commit above was a regression.

Patrick Emami · Answer 1 · Fri Mar 09 2018 08:24:54 GMT+0800 (China Standard Time)

Oops thanks. I commented it out by default, the master branch has been updated.