LyWangPX / Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions

Solutions of Reinforcement Learning, An Introduction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

exercise 5.6

Hyperion-shuo opened this issue · comments

image

Seems there is no need to do importance sampling for action on step t.

The subscript of the importance sampling ratio should start at t+1.
Because you already assume take action At so there is no need to multiply the ratio for action t, just like there is no importance sampling rate in DQN.
This answer maybe helpful :
https://www.quora.com/Why-doesn-t-DQN-use-importance-sampling-Dont-we-always-use-this-method-to-correct-the-sampling-error-produced-by-the-off-policy/answer/James-MacGlashan

Yep you are right.
I will fix it. Thank you.