seungeunrho / minimalRL

Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wrong gradient flow in bias correction term of ACER?

wwiiiii opened this issue · comments

loss2 = -correction_coeff * pi * torch.log(pi) * (q.detach()-v) # bias correction term

According to original paper, gradient for bias correction term is define as below,
image
and as pi serves as the probability for expectation calculation, it seems it's not the target of optimization.

Shouldn't we detach the pi from computational graph at above line?

Wow, you're correct.
Thanks for such a sharp comment.
I updated the code.