danijar / dreamerv2

Mastering Atari with Discrete World Models

Home Page:https://danijar.com/dreamerv2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Intrinsic Rewards

robjlyons opened this issue · comments

Is it possible to add the use of intrinsic rewards to this method?

Thanks

Plan2Explore is implemented in this code base via --expl_behavior plan2explore. The task policy will still be trained on the rewards from the environment but it will only be used for computing eval scores and not for data collection. You can also set --expl_until 1e6 if you want to switch to collecting data via the task policy after 1M steps. By default, the exploration policy uses no external rewards, but there is a config for that, too. Check out the exploration section in configs.yaml.

Edit: For future reference, the correct expl_behavior flag is --expl_behavior Plan2Explore, others can find the implementation in expl.py Sorry for the noise!