seungeunrho / minimalRL

Line 104 in 46f9b32

    
           loss2 = -correction_coeff * pi * torch.log(pi) * (q.detach()-v) # bias correction term

According to original paper, gradient for bias correction term is define as below,

and as pi serves as the probability for expectation calculation, it seems it's not the target of optimization.

Shouldn't we detach the pi from computational graph at above line?

Wow, you're correct.
Thanks for such a sharp comment.
I updated the code.

Wrong gradient flow in bias correction term of ACER?