LyWangPX / Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions

Solutions of Reinforcement Learning, An Introduction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ex 10.6

tomasruizt opened this issue · comments

Hi :)
I'm wondering about going from the first line to the second.
If r(π) = 0.5, and E[R_t+1 | S0 = A] is either 1 or 0. (correct me if I'm wrong)
How can E[R_t+1 | S0 = A] - r(π) be (-0.5)^t?
For example, for t = 0: E[R_t+1 | S0 = A] - r(π) = 1 - 0.5 = 0.5 and (-0.5)^t = 1
Am I missing somethign?

Thanks for your response. You have successfully found a typo.

I should have written (-1)^t / 2 instead of (-1/2)^t.

I will fix that in a min. :)