openai / procgen

Procgen Benchmark: Procedurally-Generated Game-Like Gym-Environments

Home Page:https://openai.com/blog/procgen-benchmark/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

New functionalities

hfeniser opened this issue · comments

Currently, in Procgen, (1) one can not get the level id that is being played before the first action is taken. (2) Also, one can not set the level id unless the environment has only one level, and if you are dealing with many levels you may have to create the environment from scratch for each level. (3) Last, one can not specify an initial state other than the original initial state (e.g. a random valid state).

It would be cool to have these features in Procgen. Any comment on how to start adding these features would also be appreciated.
We need the third feature more urgently as we have workarounds for others. Would it be correct if we assign an arbitrary valid observation as the initial observation instead of using env.reset()?

I'm afraid I don't understand why you would want to do #3. What constitutes a random valid state? Is that different from a random initial state?

Actually, what I meant by a random valid state is a particular valid state from the state space. In my thinking, let's say in maze game, valid states would correspond to different positions of the agent in the map, changing walls or the cheese would be invalid.

It is different from any random initial state in the sense that I should be able to initiate the agent from a particular state of my interest. In other words, I want to observe the agents behavior when it starts playing from a particular state.

You can refer to this sentence "Given the capacity to restart the agent in states corresponding to its past observations, ..." in this paper.

Oh, so is it sufficient to be able to save and restore environment state? You need an agent to produce the states that you want to save.

Yes, this should do I guess. But when I restored a previously saved state reward should start from 0.

Some of your requests may be addressed by the next version of procgen though it won't be out for a few weeks.

OK thank you very much for the responses. I will either implement it by myself or find a workaround. I have another questoin related to initial states and Procgen.

From what I read, diversifying the initial states can help agent to learn better (e.g. human start in ALE etc.). Do you see any reason that random initial states can not help for

  1. better training performance
  2. better generalization performance

in Procgen environments?

My first experiments on this resulted with a much poorer training performance in Starpilot game. Currently, I am reviewing my code if there is a bug or something.

It probably helps more in ALE because there is basically only a single initial state.

In Procgen there is a fairly diverse set of initial states (1 per level seed) and the set of available states are subjectively more diverse.

But still this diversity may not be enough for generalization (e.g. 500 level generalization). I hypothesize that, if adding more levels helping generalization (as shown in the paper), then one can diversify the initial states of the existing levels to get a better generalization performance.

Could you also let us know if restoring states is likely to be included in the next version?

It's likely

For now, we have implemented a workaround in our own fork.