Ex 4.1
StoyanVenDimitrov opened this issue · comments
StoyanVenDimitrov commented
Hi,
how do you come to value of state 11 being -14?
Qingwen Zhang commented
In example 4.1 This is an undiscounted episodic task, the reward is -1 on all transitions until the terminal state is reached
Since state 11 is not the terminal state so it's reward is -14