exercise 5.6

Question

Hyperion-shuo opened this issue 4 years ago · comments

Seems there is no need to do importance sampling for action on step t.

Hyperion · Answer 1 · Fri May 01 2020 10:36:06 GMT+0800 (China Standard Time)

The subscript of the importance sampling ratio should start at t+1.
Because you already assume take action At so there is no need to multiply the ratio for action t, just like there is no importance sampling rate in DQN.
This answer maybe helpful :
https://www.quora.com/Why-doesn-t-DQN-use-importance-sampling-Dont-we-always-use-this-method-to-correct-the-sampling-error-produced-by-the-off-policy/answer/James-MacGlashan

YIFAN WANG · Answer 2 · Sat May 02 2020 01:16:45 GMT+0800 (China Standard Time)

Yep you are right.
I will fix it. Thank you.