New functionalities

Question

New functionalities

hfeniser opened this issue 4 years ago · comments

Hasan Ferit Eniser commented 4 years ago

Currently, in Procgen, (1) one can not get the level id that is being played before the first action is taken. (2) Also, one can not set the level id unless the environment has only one level, and if you are dealing with many levels you may have to create the environment from scratch for each level. (3) Last, one can not specify an initial state other than the original initial state (e.g. a random valid state).

It would be cool to have these features in Procgen. Any comment on how to start adding these features would also be appreciated.
We need the third feature more urgently as we have workarounds for others. Would it be correct if we assign an arbitrary valid observation as the initial observation instead of using env.reset()?

Christopher Hesse · Answer 1 · Sun May 17 2020 15:33:16 GMT+0800 (China Standard Time)

I'm afraid I don't understand why you would want to do #3. What constitutes a random valid state? Is that different from a random initial state?

Hasan Ferit Eniser · Answer 2 · Sun May 17 2020 15:49:00 GMT+0800 (China Standard Time)

Actually, what I meant by a random valid state is a particular valid state from the state space. In my thinking, let's say in maze game, valid states would correspond to different positions of the agent in the map, changing walls or the cheese would be invalid.

It is different from any random initial state in the sense that I should be able to initiate the agent from a particular state of my interest. In other words, I want to observe the agents behavior when it starts playing from a particular state.

You can refer to this sentence "Given the capacity to restart the agent in states corresponding to its past observations, ..." in this paper.

Christopher Hesse · Answer 3 · Sun May 17 2020 17:11:22 GMT+0800 (China Standard Time)

Oh, so is it sufficient to be able to save and restore environment state? You need an agent to produce the states that you want to save.

Hasan Ferit Eniser · Answer 4 · Sun May 17 2020 17:59:47 GMT+0800 (China Standard Time)

Yes, this should do I guess. But when I restored a previously saved state reward should start from 0.

Christopher Hesse · Answer 5 · Mon May 18 2020 04:56:50 GMT+0800 (China Standard Time)

Some of your requests may be addressed by the next version of procgen though it won't be out for a few weeks.

Hasan Ferit Eniser · Answer 6 · Mon May 18 2020 06:42:45 GMT+0800 (China Standard Time)

OK thank you very much for the responses. I will either implement it by myself or find a workaround. I have another questoin related to initial states and Procgen.

From what I read, diversifying the initial states can help agent to learn better (e.g. human start in ALE etc.). Do you see any reason that random initial states can not help for

better training performance
better generalization performance

in Procgen environments?

My first experiments on this resulted with a much poorer training performance in Starpilot game. Currently, I am reviewing my code if there is a bug or something.

Christopher Hesse · Answer 7 · Mon May 18 2020 08:05:14 GMT+0800 (China Standard Time)

It probably helps more in ALE because there is basically only a single initial state.

In Procgen there is a fairly diverse set of initial states (1 per level seed) and the set of available states are subjectively more diverse.

Hasan Ferit Eniser · Answer 8 · Mon May 18 2020 18:47:05 GMT+0800 (China Standard Time)

But still this diversity may not be enough for generalization (e.g. 500 level generalization). I hypothesize that, if adding more levels helping generalization (as shown in the paper), then one can diversify the initial states of the existing levels to get a better generalization performance.

Could you also let us know if restoring states is likely to be included in the next version?

Christopher Hesse · Answer 9 · Tue May 19 2020 01:35:13 GMT+0800 (China Standard Time)

It's likely

Hasan Ferit Eniser · Answer 10 · Tue May 26 2020 05:27:21 GMT+0800 (China Standard Time)

For now, we have implemented a workaround in our own fork.