exercise 5.6
Hyperion-shuo opened this issue · comments
Hyperion commented
Hyperion commented
The subscript of the importance sampling ratio should start at t+1.
Because you already assume take action At so there is no need to multiply the ratio for action t, just like there is no importance sampling rate in DQN.
This answer maybe helpful :
https://www.quora.com/Why-doesn-t-DQN-use-importance-sampling-Dont-we-always-use-this-method-to-correct-the-sampling-error-produced-by-the-off-policy/answer/James-MacGlashan
YIFAN WANG commented
Yep you are right.
I will fix it. Thank you.