Ex 4.1

Question

StoyanVenDimitrov opened this issue 3 years ago · comments

Hi,

how do you come to value of state 11 being -14?

Qingwen Zhang · Answer 1 · Sun Mar 07 2021 08:59:06 GMT+0800 (China Standard Time)

In example 4.1 This is an undiscounted episodic task, the reward is -1 on all transitions until the terminal state is reached

Since state 11 is not the terminal state so it's reward is -14