Ensure that agent earns high reward on Colab
dniku opened this issue · comments
Dmitry Nikulin commented
Sebastian Kosch commented
Apparently you need to just train for way longer to see any results, even on Pong – like 5 million timesteps at least