REINFORCE actor update missing discount factor
GreatArcStudios opened this issue · comments
Eric Zhu commented
It seems that the REINFORCE implementation may be missing the discount factor in actor updates. The loss is just computed as:
loss = torch.sum(negative_log_probs * return_batch)
Which we can see from _actor_learn_batch
. Is this intentional?
yiwan-rl commented
Yes, it is intentional. While in theory the actor's update should include the discount factor, in practice it is common to ignore it. If you are interested in more about this, there are several works studying this mismatch (e.g., https://arxiv.org/pdf/2010.01069.pdf, https://arxiv.org/abs/1906.07073).
Eric Zhu commented
I see, thanks for the reference!