haarnoja / sac

Soft Actor-Critic

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Replication of Paper

bhairavmehta95 opened this issue · comments

Are the current hyperparameters the same ones that the paper reports? It seems like a lot of the trials on Half-Cheetah are breaking 6000 in 200 episodes, and 10000 by 1000 episodes, but I'm having a hard time recreating these results.

But, thanks for your code, it's very helpful!

Glad you like it! All other hypers should be the same, except n_train_repeat (number of gradient steps per environment step). The current default is 1, which makes the algorithm run faster in terms of wall clock time but has worse sample complexity. The results in Figure 2 are with n_train_repeat=16, which takes roughly 16 times longer to train.