Reward scale

Question

Reward scale

lgvaz opened this issue 6 years ago · comments

Some factors of reward scaling can generates instabilities, like described in #9 .

For alleviating this issue wouldn't it be a good idea to divide log_prob by reward_scale instead of multiplying the reward by it? Algorithmically speaking I think this would have the same effect.

Tuomas Haarnoja · Answer 1 · Wed Dec 26 2018 16:20:50 GMT+0800 (China Standard Time)

That's right, you can alternatively divide log_prob by reward_scale for the same effect. It indeed can be slightly more stable especially in the beginning of learning.