Changing when the vectorized environments return `done=True`

Question

Changing when the vectorized environments return `done=True`

alversafa opened this issue 4 years ago · comments

I am trying to create vectorized environments that return done=True only when certain episodes have passed, say 2. So in coinrun, when the agent dies (or gets the coin) the first time, the step function should not return done=True. It should only return done=True in the second episode.

More specifically, I am trying to create vectorized trials, rather than episodes, as in the RL^2 paper.

(It would also be great if I can return some additional stuff from the environment every step.)

What would be the easiest way to achieve this?

Christopher Hesse · Answer 1 · Wed Apr 08 2020 11:50:42 GMT+0800 (China Standard Time)

You should be able to write a wrapper of the baselines VecEnv interface that overrides the value from done, have you tried that yet?

The more complicated way is to change the C++ code to behave how you want, which I would only recommend if there's no way to do this in python.

Adding information into the info dictionary from the python side is straightforward. If you want to add things from the C++ side it's fairly complicated, see this comment: #32 (comment)

Does that answer your question?

Christopher Hesse · Answer 2 · Sat Apr 18 2020 05:22:27 GMT+0800 (China Standard Time)

Closing due to lack of activity.