facebookresearch / Pearl

A Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

REINFORCE actor update missing discount factor

GreatArcStudios opened this issue · comments

It seems that the REINFORCE implementation may be missing the discount factor in actor updates. The loss is just computed as:

loss = torch.sum(negative_log_probs * return_batch)

Which we can see from _actor_learn_batch. Is this intentional?

Yes, it is intentional. While in theory the actor's update should include the discount factor, in practice it is common to ignore it. If you are interested in more about this, there are several works studying this mismatch (e.g., https://arxiv.org/pdf/2010.01069.pdf, https://arxiv.org/abs/1906.07073).

I see, thanks for the reference!