Kautenja / gym-super-mario-bros

An OpenAI Gym interface to Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The NES

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Graphical Glitch After Call to reset

EliasHasle opened this issue · comments

Describe the bug

When rendering to the screen, usually everything is OK, but sometimes, in my experience after a lot of resets, but that could be a coincidence, the images will, for a while, not update properly. The creatures behave normally, and Mario seems to interact with the actual environment, but the background image and object image flickers between two of the first, one of them clipped. This is in level 1-1 running 'SuperMarioBros-v0' (frameskip 4), and rendering using env.render('human') on a COMPLEX_MOVEMENT environment.

Notice clipped objects and ground:
image

Here, Mario is actually between the tubes, but looks like he is somewhere else without tubes:
image

I don't mind that the visualization is wrong, and to be honest my current agent doesn't either, as it is blind. But if this bug also affects observations, then it matters to other agents.

Environment

  • Operating System: Windows 10
  • Python version: 3.6.2
  • nes-py version: 2.0
  • gym-super-mario-bros version: 4.0.2

hmmm this is a puzzling one. The cause likely lies in the save state / restore mechanism in the underlying NES emulator this project is built on. render just copies the last observation produced by a call to reset or step. So what it returns / draws is the actual game-play (every 4th frame in this case). I've seen similar issues with reset documented in #49 where the graphics would get completely destroyed, but Mario, sprites, and bounding boxes would all function as expected. Fortunately, I built the emulator so I'm familiar with fixing that bug and have some ideas as to how to approach this one. I know these bugs can be challenging to reproduce, but if you could somehow reproduce the issue, that would be very helpful in fixing it.

Since the issue seems to appear randomly, it would be helpful for reproduction to have some overview and control of the pseudorandom states. Does nes-py use pseudorandom numbers, and if so, is the state accessible? Likewise for gym-super-mario-bros. (I don't have time to inspect the sources now.)
Moreover, does env.reset trigger randomness?

What else should I log? Can I log the NES states? If so, how?

Note that I discovered the bug because it persists over multiple episodes, maybe hundreds. That also means you don't have to look at the screen all the time to discover it, but of course if you log everything, log size can become an issue (if kept in memory).

This script reproduces what I have done, but without the optimization. The policy employed is an optimized blind agent. I added logging of the number of resets, but I guess that is not sufficient to reliably reproduce the bug. But control over all pseudorandom states, including the seed for random.random, would, I think. :-)

import gym_super_mario_bros
from gym_super_mario_bros.actions import COMPLEX_MOVEMENT
from nes_py.wrappers import BinarySpaceToDiscreteSpaceEnv
import random

#All levels, starting at 1-1. With frameskip 4.
env = gym_super_mario_bros.make('SuperMarioBros-v0')
#Else use SuperMarioBrosNoFrameskip-v0

env = BinarySpaceToDiscreteSpaceEnv(env, COMPLEX_MOVEMENT)
env.reset()

policy = [0, 0, 0, 0.277, 0.723, 0, 0, 0, 0, 0, 0]
def sample_action(pol):
    r = random.random() #not seeded yet
    s = 0
    a = -1
    while s < r:
        a += 1
        s += pol[a]
    return a

resets = 0
while True:
    observation,reward,done,info = env.step(sample_action(policy))
    env.render('human')
    if done:
        env.reset()
        resets += 1
env.close()

The point with using an optimized blind policy (that sometimes even completes the level) rather than a uniformly random policy, is that this will make the bug more apparent, and also matches better with the setup where I encountered the bug in the first place. On second thought, maybe the stochastic policy shouldn't have zero probability for any actions, as the probabilities were non-zero (but converging towards zero) during the training where I encountered the bug.

OK, somewhere before 1120 resets, and still at 1120 resets, the bug resurfaced (with the script quoted above).

>>> info
{'coins': 0, 'flag_get': False, 'life': 2, 'score': 0, 'stage': 1, 'time': 386, 'world': 1, 'x_pos': 594}

I tried to export the observation, but python froze. I understand that this information is not very helpful, but at least I have reproduced the problem with a very simple setup. It can be narrowed down further by logging and controlling the randomness. And also, if we could find some observation markers/features for automatically detecting the bug and outputting the last few NES states and pseudorandom states, that would be great. (I don't know how to do that yet.)

While watching the test carefully for the first few minutes, I spotted another small graphical glitch, where the ground flickered in a mix of colors for just a few frames. That one could perhaps be just a rendering problem. Didn't seem very serious.

I am trying not to spend too much time on this, but... In smb_env.py, it looks like the start screen is skipped immediately and the game state saved, and then every subsequent episode will start from exactly the same state (unless save/load is broken, which can be suspected now). If I understand right, this means that pseudorandomness plays no role in this environment, even though it does in real SMB, where allegedly the time offset at which you press start at the start screen will affect some details.

This also happens to me too! But I thought it was just me. It eventually fixes itself sometimes for me

sorry for the delay, quite busy with many projects these days, @EliasHasle thanks for all the work you put into this bug, the script that can reproduce will be very helpful in addressing the issue. You are also correct in that randomness plays no role in the environment other than the randomness in the SMB game itself. It would be excellent to find a way to control the RNG of the NES, but I'm not sure how (or if) this is possible. As a side-note, the technique of inserting a random number (uniform between 1 and 20) of NOPs cited in the original Deep-Q paper can simulate this probabilistic start-screen behavior. Regarding the graphics, unless pyglet has a serious bug (kinda doubtful), all issues in rendering can be attributed to bugs in the NES emulator (nes-py). As such, I suspect a thorough review of the code over there is warranted to inject some logging functionality and potentially uncover any obvious errors. I have some time this weekend so I'll be looking into things.

Alright. Adding
random.seed(1)
before the loop triggers the bug after exactly five episodes (fifteen lives), all of them early in 1-1. See if you can find out what happens. :-)

import gym_super_mario_bros
from gym_super_mario_bros.actions import COMPLEX_MOVEMENT
from nes_py.wrappers import BinarySpaceToDiscreteSpaceEnv
import random
import time

#All levels, starting at 1-1. With frameskip 4.
env = gym_super_mario_bros.make('SuperMarioBros-v0')
#Else use SuperMarioBrosNoFrameskip-v0

env = BinarySpaceToDiscreteSpaceEnv(env, COMPLEX_MOVEMENT)
env.reset()

policy = [0, 0, 0, 0.277, 0.723, 0, 0, 0, 0, 0, 0]
def sample_action(pol):
    r = random.random()
    s = 0
    a = -1
    while s < r:
        a += 1
        s += pol[a]
    return a

theseed = 1 #Here is a fast seed I found
random.seed(theseed)
resets = 0
while True:
    observation,reward,done,info = env.step(sample_action(policy))
    #This was added to show the last life before the bug in human speed
    if resets==4 and info["life"]==1:
        time.sleep(1/15)
    env.render('human')
    if done:
        env.reset()
        resets += 1
        #Run at most 20 episodes on each given seed.
        #If interrupted shortly after seeing the bug,
        #starting from the last seed is likely to
        #reproduce it fast. Although this is not 100%
        #certain, it may be a good place to start.
        if resets == 20:
            theseed += 1
            random.seed(theseed)
            resets = 0
env.close()

I don't see anything suspicious... The only finding is that the bug in this case occurs right after a death and end of an episode, that is on an environment reset. I suppose now it is within reach to inspect the last interactions in detail.

wow that's fantastic, 100% reproducible on my end. Will explore some changes to the NES emulator to get this fixed. Some truly strange behavior starts happening. Goombas are occasionally spawning at the top of the screen as if they're being thrown at Mario lol. And indeed it does work its way out of the situation eventually so some aspect of the machine's state must be getting corrupted and then restored later on by in-game triggers.

I think the falling Goombas are from later in the level, consistent with Mario interacting as usual. Maybe the game starts working again when Mario visits 1-2?

You can safely skip "Many" in the name of this issue. I got an idea to count the number of interactions before the perhaps crucial last episode before the bug shows up, and instead of doing the interactions just burn that many randoms. It worked. The error shows after one single episode. Test script below:

import gym_super_mario_bros
from gym_super_mario_bros.actions import COMPLEX_MOVEMENT
from nes_py.wrappers import BinarySpaceToDiscreteSpaceEnv
import random
import time

#All levels, starting at 1-1. With frameskip 4.
env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = BinarySpaceToDiscreteSpaceEnv(env, COMPLEX_MOVEMENT)
env.reset()

#Use the "good" seed that I found:
random.seed(1)
#708: Burn the same number of randoms that would be used
#before the crucial episode, to recreate the episode. WORKED!
#Added 88: Also burn the number of randoms spent during first two lives,
#to see if the bug can be reproduced even when making the same 
#actions on the first life. Did NOT work.
for i in range(708):#+88):
    random.random()

resets = 0
while True:
    a = 3 if random.random() < 0.277 else 4
    observation,reward,done,info = env.step(a)
    #Show crucial episode in human time
    if resets==0:
        time.sleep(1/15)
    env.render('human')
    if done:
        env.reset()
        resets += 1
env.close()

Adding 88 to the burn-in actually discloses another graphical glitch with the ground flashing in colors (mentioned by me earlier) after the "checkpoint". The glitches may of course be related somehow. Perhaps the sequence of actions during what was the last life before the glitch in the script above trigger the other glitch when played in the first life.

Instead adding 169 to the burn-in jumps to the episode where the bug would show, but does not trigger the bug. So there is indeed some connection to the actions made in the previous episode (which should be independent).

One way I would suggest to understand the bug better, would be to find the episode before the bug disappears again, and try to skip all episodes in between, by (similarly) counting actions and burning the same number of randoms, to see if the bug also disappears as a consequence of actions taken in a single episode.

Otherwise one may perhaps try to alter late actions in the last bug-free episode, to see where the bug happens?

Finally have some time to get back to this issue. Oddly the scripts fail to produce any errors when I run them on MacOS. I normally work on Ubuntu (which does successfully reproduce all issues) so I hadn't noticed this before. I'm assuming the pseudo-random states are just different on MacOS.

Because the call to reset restores the emulator to a saved state (created after skipping through the start screen), my thought is that some piece of information is not being overwritten during the state restore. It has graphical ramifications so my inclination is that it's constrained to the PPU, but it could be the CPU being corrupted and issuing garbage to the PPU. I've refactored the code for both PPU and CPU for the NES emulator to resolve some really bad smells that will make it easier to mess with the state restore code. Ultimately, it will be best to implement some very basic unit tests for the NES that ensure that state backup and restore operations function properly. @EliasHasle thanks again for diligence with this, your scripts will be a god-send for fixing this.

No problem! Just remember to not start messing with other uses of python's random yet (e.g. to randomize start screen waiting time), because that will break the examples. Happy coding! :-)

I've had no luck so far sadly. Time is tight these days, but I'll be working on the issue in spare time. Will certainly report back once things are fixed and deoyed. Anyone following this issue can check out the nes-py repository to follow progress (where the bug likely derives from) or experiment with potential solutions.

hello! I want to ask this problem is solved?

@ian840512 happy to inform that it is fixed now 👍. I'm pushing the new version rn. @EliasHasle thanks again for working out those debug scripts! They were way helpful to try fixes out. Closing issue as (hopefully) this is the final resolution.

released as 5.0.1

Wow!!!I'm just thinking that I need to use the ppaquette version.Thanks!!!!!!!!!!

Great news! Thanks! 🥇 And merry Christmas to you too!

BTW, pip called the newest version 6.0.1. Is that right?

And a happy new year! Yes 6.0.1 is correct. There was a minor change (that is still technically API breaking for somebody – life in the info dict is decremented by 1 now to be less mathy {1, ..., 3} and more compy sciy {0, ..., 2}), and another small fix.

python_2018-12-29_19-38-20
When I run env.reset several times, I get a rendering error :(
Prior to the update, it was like the first screenshot of the issue thread, but after updating to 6.0.1, the same thing happens. Is there a solution?

nope! happy coding to anyone that wants to give it a crack. I will not be returning to this issue.

@bic4907 Does this happen often? Do you have a minimal script to reliably reproduce the issue, something like the one I provided above?

The current bug is how this graphical glitch was before the backup/restore feature. Unless there is a bug in the emulator itself, i suspect this all relates to sloppy memory mgmt in the functionally decomposed NES emulator. I've tried refactoring it to OO to better handle creating and restoring game states and memory in general, but it's a nontrivial fix, particular in the CPU namespace bc the OG author uses C++ templates in a painfully esoteric fashion. A reproduction script will be helpful for whoever has time to try and fix this.

@bic4907 Now I have seen a variant of that bug too. Graphics go all crazy, but Mario looks quite normal. Don't know how often it happens. @Kautenja Could it be related to the improvement of laiNES by which you avoid an extra screen copy per frame?

@Kautenja
I notice that with the newest version a lot of time is spent within the intro screen. I thought the intro screen should be kept out. Also, it doesn't look a lot like my agent playing. I suspect that the env is actually stuck in the intro screen or something. I am sorry to say that I think the old version was actually better.

Update: The second behavior described seems to appear only when running envs in multiple (supposedly independent) processes. Hm.

It seems the actions may be reordered, which explains the poor performance of my prespecified policy. Using a newly optimized policy, the results look reasonable for a while. But, as you can see, the new graphical glitch is very much present (as well as frequent hangups). This run started with 16 independent processes, each with their own window, and over time all of them got the bug, and the five remaining ones hang too.
image

Behavior confirmed in 6.0.1 and 5.0.1, whereas 5.0.0 neither hangs nor has the red/black stripes bug, but has the bug described in the original post.

If I use 5.0.0 version.Is there some ways to avoid this bug?
I see somebody can use this mario version and train correctly.
https://github.com/jcwleo/mario_rl

The bug in 5.0.0 is a disturbance, but will probably not completely break all RL approaches. Noisy and erroneous data are common in ML.

My approach, inspired by Ha&Schmidhuber (https://worldmodels.github.io) involves converting frames to a low-dimensional embedding and working on those embeddings for sequential analysis. The bug in 5.0.0 may imprint some nearly static features in the autoencoder distribution, which is not desirable. (But H&S demonstrates a good example of learning from a noisy environment, namely one "hallucinated" from the dynamics learned on the embeddings.)

Good news! I found a good object oriented NES implementation using C++ 11 that I'm porting as a new back-end for nes-py. Once that's done this bug should no longer be a problem over here :). 5.0.0 and 6.0.1 bugs occur at roughly the same rate and both resolve after some unknown number of episodes. The behavior in 5.0.0 is a normal game, but all background pixels are missing, so sprites still render fine. If we're talking CNN based techniques, this could influence the feature extractors to focus on sprite features as they will render naturally. However, it will also incline the agent to "memorize" the path as holes and platforms will likely not render. For 6.0.1, there is a game playing on top of another or something? But I imagine this would really confuse a CNN based solution and lead more towards memorizing paths. But as @EliasHasle mentioned, the noisy data in this area is common and could actually improve the performance through some unintuitive means! When bugs are resolved, it will be interesting to conduct a study as to how these bugs impact various RL techniques.

EDIT: a link

I am happy to hear that you are not giving up on this project. I have just discovered openAI's Retro Gym, which also supports SMB. I will try that out too, but I don't know what to expect from such a general framework, in terms of cutscene removal and other game-specific optimizations. Perhaps a viable approach could be making a wrapper environment (around the retro environment) that skips uninteresting sequences and converts to enumerated controls etc, if necessary and sufficiently performant.

It seems the actions may be reordered, which explains the poor performance of my prespecified policy. Using a newly optimized policy, the results look reasonable for a while. But, as you can see, the new graphical glitch is very much present (as well as frequent hangups). This run started with 16 independent processes, each with their own window, and over time all of them got the bug, and the five remaining ones hang too.
image

Behavior confirmed in 6.0.1 and 5.0.1, whereas 5.0.0 neither hangs nor has the red/black stripes bug, but has the bug described in the original post.

@EliasHasle I have the same bug in that version. But I don't know how glitch happens.
When multiple threads were used for testing, almost all games eventually showed 'no response'.
Even in the random action sample code, this problem was found and appears to be in the random number of episodes.

@EliasHasle I made up some academic reasons so I could feel better about spending time on it lol. Also, I ran into open ai retro shortly after porting LaiNES for this project (bit of a facepalm). It may be better or worse depending on how they set it up though.

@bic4907 if by threads you mean Python's abstraction in the threading.Thread package, that's because Thread in this case is still contained in the same process and the NES emulator's back-end currently has globally namespaced data structures. So, you'd need separate processes for each instance of the emulator using multiprocessing.Process. That will be resolved by the new back-end.

@Kautenja I used multiprocessing package for testing. This issue seems to occur in this sample code as well. My environment is Windows 10 64bit/pakage 6.0.1, Is it stable in Ubuntu? Or do you have a stable package version to try? Thank you for your effort :)

Interesting, I've not tested much on Windows beyond ensuring that I could get it installed, compiled, and have all unit tests pass. Ubuntu and MacOS have worked pretty reliably for me with all versions, but I've been running serial processes. Does the issue occur simultaneously across all instances, like do they all fail at the same time? And if so, is any actuation of individual agents conditioned on a random seed and is the seed set to different values between processes? I'd not seen the issue in the sample script yet, does it occur after a certain number of steps? Also no problem!

Regarding stability, there is a really old version somewhere using a really slow NES back-end that is free of this glitch. Unfortunately, it has been a while since, and I have no clue which versions of gym-super-mario-bros and nes-py you'd need to match up to make that work. But, it is an option. It's a fork of the ppaquette version that has some improvements (including parallelism support). However, if you can wait around 2 weeks, I should have the new back-end up and running by then with (hopefully) no glitches, fast execution speeds, and maybe some bonus features :)

I believe @bic4907 intends to say that it happens irrespective of multiprocessing. Quote:

When multiple threads were used for testing, almost all games eventually showed 'no response'.
Even in the random action sample code, this problem was found and appears to be in the random number of episodes.

@EliasHasle Yes, it also happends on single process (not using multiprocessing)

@Kautenja Testing in Windows 10 and Ubuntu 16.04 reveals that the process dies irregularly. These issues are frequently discovered before 500 episodes, and first, glitch occurs on the screen, then the backend dies after a while. I'll have to test it again with the new back-end.

I have also experienced several hangups on sky blue screen, possibly without first getting the red-black stripe pattern glitch (but I am not sure).

Let's hope that with the new backend, either the problems are gone or they are much easier to debug. :-)

Also, the red/black pattern is the most common appearance of the glitch. Sometimes it looks different, with red rings in a grid pattern on part or the screen etc., but seems to converge to the most common after a while.

i've seen the blue screen as well, but cant reproduce that one atm. It's odd that the glitch produces this uniform pattern. @EliasHasle have you tried building nes-py from source with the fix you introduced to LaiNES? I haven't been by my workstation to properly inject that and try it out, but will be able to tomorrow afternoon

No, I haven't tried that. I just thought it could be of interest to you, for your academic project... :-) Indeed, it will be best for a valid comparison between the buggy and bug-free version if they are based on the same back-end.

Also, I ran into open ai retro shortly after porting LaiNES for this project (bit of a facepalm). It may be better or worse depending on how they set it up though.
Regarding stability, there is a really old version somewhere using a really slow NES back-end that is free of this glitch.

I believe OpenAI use a similar backend to the one you used before. Out of curiosity, I checked source file names for both, and OpenAI appears to use a variant of FCEUX, which is at least a relative of the FCEU you used. I am hoping to see some insane speedups with the new backend. :)

I don't know if this is a stupid question/request, but I wish for being able to limit the number of internal screen renderings of the emulator through the environment object, if that is possible. This would be in addition to skipping internal rendering during interaction frameskips (which I guess you are planning already, if it is at all possible). An application of this control could be for higher-order non-uniform frameskip during frame-sampling procedures or sticky actions etc.

Hmm, yea if they're using FCEU they either have to have plugged into the cpp with ctypes (or whatever python to cpp framework they prefer) or written a lua script as a general purpose interface between retro and FCEU (how ppaquette gym-smb works). If it's the latter, you can expect it to be pretty slow. I can't speak for the former, but OpenAI surely has the engineering man hours to make that sort of change happen. I spent days trying to port FCEUX for use with ctypes before giving up because the codebase is huge and hard to follow.

Not a stupid request at all! I think some means of control for frame skipping is an excellent idea and one that is supported by the research. Ultimately, I think the best option (perhaps at the cost of some performance) is to remove frame-skipping from the environment entirely, and instead introduce this functionality using an external wrapper class. This way, users could design their own implementation of the feature without resorting to polymorphism or having to register tons of different environments with gym. It will also be easier to test and have better style since to me, frame skipping in the env is functional envy. Great idea!

Retro connects directly to cpp. I have spent some time today trying to make a wrapper for retro that makes the experience very similar to the one with yours, borrowing some of your tricks for skipping busy frames: https://gist.github.com/EliasHasle/5958fceccbb10f8279c860caa8c31534
I have not compared performance yet, but it looks OK. The available memory interface requires reading and writing the whole state at once, which I guess is slower than modifying single addresses. It is still fast, but the narrow interface does not allow all the hacking I imagine one can do to improve performance further, e.g. deeper frame skipping (by which I mean sparing pure visualization calculations in the backend, if they exist. They are perhaps contained in the emulator's get_screen method already.).

The new back-end is in place with all basic features :) I'll release the code sometime today under 3.0.0 hopefully with this bug completely resolved. I haven't run a true performance test yet, but this back-end seems quite a bit faster. It was definitely easier to work with! Will close issue for now, but wont be surprised to see it open back up lol.

whoops it'll be version 6.0.2 actually. nes-py is the one at 3.0.0.

using tqm to measure execution time:

from nes_py import NESEnv
import tqdm
env = NESEnv('./nes_py/tests/games/smb1.nes')

done = True

try:
    for _ in tqdm.tqdm(range(5000)):
        if done:
            state = env.reset()
            done = False
        else:
            state, reward, done, info = env.step(env.action_space.sample())
except KeyboardInterrupt:
    pass

I get 456 iterations per second using 2.0.0 of nes-py and 537 iterations per second using 3.0.0. About a 17% increase.

A similar test with retro reveals that it performs about 3.5 times better, both with and without rendering to screen. I almost decided to use that instead, but it turned out to not support multiple emulator instances per process. I am happy to see that the newest version of nes-py is so flexible with regards to this. I am considering having multiple environment instances in each thread too, e.g. to send batches of frames from independent games throughthe same neural network. I may end up using both nes-py and retro, actually, for different things (must take care to have compatible reward functions if rewards will be used). Good job! :-)

Yea I'm not sure quite yet what trickery gets retro up to such a high frame rate. I was able to squeak out another 100 FPS from nes-py just by rearranging the C++ code here and there, so it's likely that more free performance gains are hidden in the code. Will continue reading up on C++ and injecting optimizations where possible. It would be great to dethrone their emulator!