danijar / dreamerv2

Mastering Atari with Discrete World Models

Home Page:https://danijar.com/dreamerv2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Have you considered using a PPO actor instead of a normal Actor-Critic?

outdoteth opened this issue · comments

I think a lot of improvement could be made by using a PPO actor.

PPO clips the advantage values so that it can safely train on on-policy data for multiple gradient steps. DreamerV2 uses a world model and thus can generate an unlimited amount of on-policy data without having to interact with the environment, so there is not much of a point in training on the same imagined trajectories multiple times.