REINFORCE actor update missing discount factor

Question

REINFORCE actor update missing discount factor

GreatArcStudios opened this issue 5 months ago · comments

It seems that the REINFORCE implementation may be missing the discount factor in actor updates. The loss is just computed as:

loss = torch.sum(negative_log_probs * return_batch)

Which we can see from _actor_learn_batch. Is this intentional?

yiwan-rl · Answer 1 · Sat Jan 13 2024 02:30:39 GMT+0800 (China Standard Time)

Yes, it is intentional. While in theory the actor's update should include the discount factor, in practice it is common to ignore it. If you are interested in more about this, there are several works studying this mismatch (e.g., https://arxiv.org/pdf/2010.01069.pdf, https://arxiv.org/abs/1906.07073).

Eric Zhu · Answer 2 · Sat Jan 13 2024 04:46:47 GMT+0800 (China Standard Time)

I see, thanks for the reference!