haarnoja / sac

Soft Actor-Critic

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reward scale

lgvaz opened this issue · comments

Some factors of reward scaling can generates instabilities, like described in #9 .

For alleviating this issue wouldn't it be a good idea to divide log_prob by reward_scale instead of multiplying the reward by it? Algorithmically speaking I think this would have the same effect.

That's right, you can alternatively divide log_prob by reward_scale for the same effect. It indeed can be slightly more stable especially in the beginning of learning.