Kautenja / gym-super-mario-bros

An OpenAI Gym interface to Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The NES

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to play all stages in `SuperMarioBros-v0` but using models trained in separate stages?

terryzhao127 opened this issue · comments

Is your feature request related to a problem? Please describe.

I've trained models to play through the game. However, there are 32 models to play the 32 separate stages. I cannot let the trained agent play in the SuperMarioBros-v0 from the first stage to the last stage. In other words, is there any way to make the trained agent play in a environment with no any skipped frames, while the should-be-skipped frames can be implied by info. Then, a complete game can be rendered from the start screen to the last frame that Mario saves the princess.

Describe the solution you'd like

The modified environment should be run like this:

    while True:
        if not should_skip(info):
              action = model(state)
        else:
              action = 0
        state, reward, done, info = env.step(action)
        env.render()

Describe alternatives you've considered

Actually I've tried two ways to solve the problem:

  1. Add a _should_skip_during_steps function into smb_env.py, which is called by _get_info.
    In this method, the agent does NOOP every time a should-be-skipped step is found. And the _should_skip_during_steps is implemented using skip conditions located in self._skip_xxx() functions. However, the NOOP skip is not the same way the self._skip_xxx() functions do. More frames are skipped, then the first frame after the skipped frames is not the excatly same as that is returned by a plain SuperMarioBros-v0 environment. The little trivial difference is non-trivial to the reinforcement model, which leads to failure in the gameplay afterwards.
  2. Ignore the skipped frames. Just play in the SuperMarioBros-v0 but change the loaded model every time a new stage is detected in info.
    In this method, the agent starts to fail after entering 1-2 from 1-1. Apparently the begining state in SuperMarioBros-1-2-v0 is not the same as that is returned in the SuperMarioBros-v0 after skipped frames.

Can anyone help me run the code on Windows 10 i am unable to run it which file is the main i have satisifed all the requirements!