Replication of Paper

Question

Replication of Paper

bhairavmehta95 opened this issue 6 years ago · comments

Are the current hyperparameters the same ones that the paper reports? It seems like a lot of the trials on Half-Cheetah are breaking 6000 in 200 episodes, and 10000 by 1000 episodes, but I'm having a hard time recreating these results.

But, thanks for your code, it's very helpful!

Tuomas Haarnoja · Answer 1 · Fri Mar 23 2018 03:45:28 GMT+0800 (China Standard Time)

Glad you like it! All other hypers should be the same, except n_train_repeat (number of gradient steps per environment step). The current default is 1, which makes the algorithm run faster in terms of wall clock time but has worse sample complexity. The results in Figure 2 are with n_train_repeat=16, which takes roughly 16 times longer to train.