Intrinsic Rewards

Question

Intrinsic Rewards

robjlyons opened this issue 3 years ago · comments

Is it possible to add the use of intrinsic rewards to this method?

Thanks

Danijar Hafner · Answer 1 · Thu Aug 05 2021 04:38:17 GMT+0800 (China Standard Time)

Plan2Explore is implemented in this code base via --expl_behavior plan2explore. The task policy will still be trained on the rewards from the environment but it will only be used for computing eval scores and not for data collection. You can also set --expl_until 1e6 if you want to switch to collecting data via the task policy after 1M steps. By default, the exploration policy uses no external rewards, but there is a config for that, too. Check out the exploration section in configs.yaml.

mjlbach · Answer 2 · Thu Oct 07 2021 03:40:22 GMT+0800 (China Standard Time)

Edit: For future reference, the correct expl_behavior flag is --expl_behavior Plan2Explore, others can find the implementation in expl.py Sorry for the noise!