moji1 / tp_rl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why the reward function must be called before updating the observation in PointWiseEnv.py?

2017040264 opened this issue · comments

1659071920738

Shouldn't the reward be an evaluation of the observed new state S(t+1)?
For example,openai_gym_cartpole