Why the reward function must be called before updating the observation in PointWiseEnv.py?
2017040264 opened this issue · comments
陈凡亮 commented
Shouldn't the reward be an evaluation of the observed new state S(t+1)?
For example,openai_gym_cartpole