LyWangPX / Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions

Solutions of Reinforcement Learning, An Introduction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Exercise 6.1

Hyperion-shuo opened this issue · comments

It looks like a typo. I will double check tomorrow.
Thanks for your response.

I see. The definition of u is a typo. I should have written u_t = ... instead of u_{t+1}
The answer has been updated and thanks for your contribution.

This is not solved because

u_[t] = V_[t+1](s_[t]) - V_[t](s_[t]) != V_[t+1](s_[t+1]) - V_[t](s_[t+1])
even you apply u_[t+1] = V_[t+2](s_[t+1]) - V_[t+1](s_[t+1]) != V_[t+1](s_[t+1]) - V_[t](s_[t+1])

For state k+1, u_[k+1] tells about V_[k+2] and V_[k+1], 
but you have to express V_[k+1] and V_[k] for the solution.

The solution can never be achieved in the form of u_t as you defined.

You can checkout the final equation,
excatly match

G_[t] - V_[t](s_[t])