Graphical Glitch After Call to reset

Question

Graphical Glitch After Call to reset

EliasHasle opened this issue 6 years ago · comments

Describe the bug

When rendering to the screen, usually everything is OK, but sometimes, in my experience after a lot of resets, but that could be a coincidence, the images will, for a while, not update properly. The creatures behave normally, and Mario seems to interact with the actual environment, but the background image and object image flickers between two of the first, one of them clipped. This is in level 1-1 running 'SuperMarioBros-v0' (frameskip 4), and rendering using env.render('human') on a COMPLEX_MOVEMENT environment.

Notice clipped objects and ground:

Here, Mario is actually between the tubes, but looks like he is somewhere else without tubes:

I don't mind that the visualization is wrong, and to be honest my current agent doesn't either, as it is blind. But if this bug also affects observations, then it matters to other agents.

Environment

Operating System: Windows 10
Python version: 3.6.2
nes-py version: 2.0
gym-super-mario-bros version: 4.0.2

Christian Kauten commented 6 years ago

no

Christian Kauten · Answer 1 · Wed Oct 10 2018 05:01:42 GMT+0800 (China Standard Time)

hmmm this is a puzzling one. The cause likely lies in the save state / restore mechanism in the underlying NES emulator this project is built on. render just copies the last observation produced by a call to reset or step. So what it returns / draws is the actual game-play (every 4th frame in this case). I've seen similar issues with reset documented in #49 where the graphics would get completely destroyed, but Mario, sprites, and bounding boxes would all function as expected. Fortunately, I built the emulator so I'm familiar with fixing that bug and have some ideas as to how to approach this one. I know these bugs can be challenging to reproduce, but if you could somehow reproduce the issue, that would be very helpful in fixing it.

Elias Hasle · Answer 2 · Wed Oct 10 2018 14:22:03 GMT+0800 (China Standard Time)

Since the issue seems to appear randomly, it would be helpful for reproduction to have some overview and control of the pseudorandom states. Does nes-py use pseudorandom numbers, and if so, is the state accessible? Likewise for gym-super-mario-bros. (I don't have time to inspect the sources now.)
Moreover, does env.reset trigger randomness?

What else should I log? Can I log the NES states? If so, how?

Elias Hasle · Answer 3 · Wed Oct 10 2018 14:43:13 GMT+0800 (China Standard Time)

Note that I discovered the bug because it persists over multiple episodes, maybe hundreds. That also means you don't have to look at the screen all the time to discover it, but of course if you log everything, log size can become an issue (if kept in memory).

This script reproduces what I have done, but without the optimization. The policy employed is an optimized blind agent. I added logging of the number of resets, but I guess that is not sufficient to reliably reproduce the bug. But control over all pseudorandom states, including the seed for random.random, would, I think. :-)

import gym_super_mario_bros
from gym_super_mario_bros.actions import COMPLEX_MOVEMENT
from nes_py.wrappers import BinarySpaceToDiscreteSpaceEnv
import random

#All levels, starting at 1-1. With frameskip 4.
env = gym_super_mario_bros.make('SuperMarioBros-v0')
#Else use SuperMarioBrosNoFrameskip-v0

env = BinarySpaceToDiscreteSpaceEnv(env, COMPLEX_MOVEMENT)
env.reset()

policy = [0, 0, 0, 0.277, 0.723, 0, 0, 0, 0, 0, 0]
def sample_action(pol):
    r = random.random() #not seeded yet
    s = 0
    a = -1
    while s < r:
        a += 1
        s += pol[a]
    return a

resets = 0
while True:
    observation,reward,done,info = env.step(sample_action(policy))
    env.render('human')
    if done:
        env.reset()
        resets += 1
env.close()

The point with using an optimized blind policy (that sometimes even completes the level) rather than a uniformly random policy, is that this will make the bug more apparent, and also matches better with the setup where I encountered the bug in the first place. On second thought, maybe the stochastic policy shouldn't have zero probability for any actions, as the probabilities were non-zero (but converging towards zero) during the training where I encountered the bug.

Elias Hasle · Answer 4 · Wed Oct 10 2018 16:03:30 GMT+0800 (China Standard Time)

OK, somewhere before 1120 resets, and still at 1120 resets, the bug resurfaced (with the script quoted above).

>>> info
{'coins': 0, 'flag_get': False, 'life': 2, 'score': 0, 'stage': 1, 'time': 386, 'world': 1, 'x_pos': 594}

I tried to export the observation, but python froze. I understand that this information is not very helpful, but at least I have reproduced the problem with a very simple setup. It can be narrowed down further by logging and controlling the randomness. And also, if we could find some observation markers/features for automatically detecting the bug and outputting the last few NES states and pseudorandom states, that would be great. (I don't know how to do that yet.)

While watching the test carefully for the first few minutes, I spotted another small graphical glitch, where the ground flickered in a mix of colors for just a few frames. That one could perhaps be just a rendering problem. Didn't seem very serious.

Elias Hasle · Answer 5 · Wed Oct 10 2018 17:39:34 GMT+0800 (China Standard Time)

I am trying not to spend too much time on this, but... In smb_env.py, it looks like the start screen is skipped immediately and the game state saved, and then every subsequent episode will start from exactly the same state (unless save/load is broken, which can be suspected now). If I understand right, this means that pseudorandomness plays no role in this environment, even though it does in real SMB, where allegedly the time offset at which you press start at the start screen will affect some details.

rtang23 · Answer 6 · Thu Oct 11 2018 11:12:04 GMT+0800 (China Standard Time)

This also happens to me too! But I thought it was just me. It eventually fixes itself sometimes for me

Christian Kauten · Answer 7 · Fri Oct 12 2018 04:24:41 GMT+0800 (China Standard Time)

sorry for the delay, quite busy with many projects these days, @EliasHasle thanks for all the work you put into this bug, the script that can reproduce will be very helpful in addressing the issue. You are also correct in that randomness plays no role in the environment other than the randomness in the SMB game itself. It would be excellent to find a way to control the RNG of the NES, but I'm not sure how (or if) this is possible. As a side-note, the technique of inserting a random number (uniform between 1 and 20) of NOPs cited in the original Deep-Q paper can simulate this probabilistic start-screen behavior. Regarding the graphics, unless pyglet has a serious bug (kinda doubtful), all issues in rendering can be attributed to bugs in the NES emulator (nes-py). As such, I suspect a thorough review of the code over there is warranted to inject some logging functionality and potentially uncover any obvious errors. I have some time this weekend so I'll be looking into things.

Elias Hasle · Answer 8 · Fri Oct 12 2018 06:01:55 GMT+0800 (China Standard Time)

No worries. :-) I thought the time spent on the start screen determined the randomness alone. According to a video I watched about SMB speedrunning, the best runners control carefully when they press start in order to get "fortunate" patterns in the game. In a RL setting I suppose it makes some sense to randomize this waiting by default to avoid overfitting. Den tor. 11. okt. 2018, 22.24 skrev Christian Kauten < notifications@github.com>:

…

sorry for the delay, quite busy with many projects these days, @EliasHasle <https://github.com/EliasHasle> thanks for all the work you put into this bug, the script that can reproduce will be very helpful in addressing the issue. You are also correct in that randomness plays no role in the environment other than the randomness in the SMB game itself. It would be excellent to find a way to control the RNG of the NES, but I'm not sure how (or if) this is possible. As a side-note, the technique of inserting a random number (uniform between 1 and 20) of NOPs cited in the original Deep-Q paper can simulate this probabilistic start-screen behavior. Regarding the graphics, unless pyglet has a serious bug (kinda doubtful), all issues in rendering can be attributed to bugs in the NES emulator ( nes-py <https://github.com/Kautenja/nes-py>). As such, I suspect a thorough review of the code over there is warranted to inject some logging functionality and potentially uncover any obvious errors. I have some time this weekend so I'll be looking into things. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#72 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQaT-9M6ur2AwkoURUxZvx_CMQ0mTFYbks5uj6kKgaJpZM4XT-2l> .

Elias Hasle · Answer 9 · Fri Oct 12 2018 13:19:31 GMT+0800 (China Standard Time)

Alright. Adding
random.seed(1)
before the loop triggers the bug after exactly five episodes (fifteen lives), all of them early in 1-1. See if you can find out what happens. :-)

Elias Hasle · Answer 10 · Fri Oct 12 2018 13:30:15 GMT+0800 (China Standard Time)

import gym_super_mario_bros
from gym_super_mario_bros.actions import COMPLEX_MOVEMENT
from nes_py.wrappers import BinarySpaceToDiscreteSpaceEnv
import random
import time

#All levels, starting at 1-1. With frameskip 4.
env = gym_super_mario_bros.make('SuperMarioBros-v0')
#Else use SuperMarioBrosNoFrameskip-v0

env = BinarySpaceToDiscreteSpaceEnv(env, COMPLEX_MOVEMENT)
env.reset()

policy = [0, 0, 0, 0.277, 0.723, 0, 0, 0, 0, 0, 0]
def sample_action(pol):
    r = random.random()
    s = 0
    a = -1
    while s < r:
        a += 1
        s += pol[a]
    return a

theseed = 1 #Here is a fast seed I found
random.seed(theseed)
resets = 0
while True:
    observation,reward,done,info = env.step(sample_action(policy))
    #This was added to show the last life before the bug in human speed
    if resets==4 and info["life"]==1:
        time.sleep(1/15)
    env.render('human')
    if done:
        env.reset()
        resets += 1
        #Run at most 20 episodes on each given seed.
        #If interrupted shortly after seeing the bug,
        #starting from the last seed is likely to
        #reproduce it fast. Although this is not 100%
        #certain, it may be a good place to start.
        if resets == 20:
            theseed += 1
            random.seed(theseed)
            resets = 0
env.close()

I don't see anything suspicious... The only finding is that the bug in this case occurs right after a death and end of an episode, that is on an environment reset. I suppose now it is within reach to inspect the last interactions in detail.

Christian Kauten · Answer 11 · Sun Oct 14 2018 02:52:17 GMT+0800 (China Standard Time)

wow that's fantastic, 100% reproducible on my end. Will explore some changes to the NES emulator to get this fixed. Some truly strange behavior starts happening. Goombas are occasionally spawning at the top of the screen as if they're being thrown at Mario lol. And indeed it does work its way out of the situation eventually so some aspect of the machine's state must be getting corrupted and then restored later on by in-game triggers.

Elias Hasle · Answer 12 · Sun Oct 14 2018 02:58:10 GMT+0800 (China Standard Time)

I think the falling Goombas are from later in the level, consistent with Mario interacting as usual. Maybe the game starts working again when Mario visits 1-2?

Elias Hasle · Answer 13 · Thu Oct 18 2018 03:31:10 GMT+0800 (China Standard Time)

You can safely skip "Many" in the name of this issue. I got an idea to count the number of interactions before the perhaps crucial last episode before the bug shows up, and instead of doing the interactions just burn that many randoms. It worked. The error shows after one single episode. Test script below:

import gym_super_mario_bros
from gym_super_mario_bros.actions import COMPLEX_MOVEMENT
from nes_py.wrappers import BinarySpaceToDiscreteSpaceEnv
import random
import time

#All levels, starting at 1-1. With frameskip 4.
env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = BinarySpaceToDiscreteSpaceEnv(env, COMPLEX_MOVEMENT)
env.reset()

#Use the "good" seed that I found:
random.seed(1)
#708: Burn the same number of randoms that would be used
#before the crucial episode, to recreate the episode. WORKED!
#Added 88: Also burn the number of randoms spent during first two lives,
#to see if the bug can be reproduced even when making the same 
#actions on the first life. Did NOT work.
for i in range(708):#+88):
    random.random()

resets = 0
while True:
    a = 3 if random.random() < 0.277 else 4
    observation,reward,done,info = env.step(a)
    #Show crucial episode in human time
    if resets==0:
        time.sleep(1/15)
    env.render('human')
    if done:
        env.reset()
        resets += 1
env.close()

Elias Hasle · Answer 14 · Thu Oct 18 2018 03:45:30 GMT+0800 (China Standard Time)

Adding 88 to the burn-in actually discloses another graphical glitch with the ground flashing in colors (mentioned by me earlier) after the "checkpoint". The glitches may of course be related somehow. Perhaps the sequence of actions during what was the last life before the glitch in the script above trigger the other glitch when played in the first life.

Instead adding 169 to the burn-in jumps to the episode where the bug would show, but does not trigger the bug. So there is indeed some connection to the actions made in the previous episode (which should be independent).

One way I would suggest to understand the bug better, would be to find the episode before the bug disappears again, and try to skip all episodes in between, by (similarly) counting actions and burning the same number of randoms, to see if the bug also disappears as a consequence of actions taken in a single episode.

Otherwise one may perhaps try to alter late actions in the last bug-free episode, to see where the bug happens?

Christian Kauten · Answer 15 · Sat Oct 20 2018 09:59:27 GMT+0800 (China Standard Time)

Finally have some time to get back to this issue. Oddly the scripts fail to produce any errors when I run them on MacOS. I normally work on Ubuntu (which does successfully reproduce all issues) so I hadn't noticed this before. I'm assuming the pseudo-random states are just different on MacOS.

Because the call to reset restores the emulator to a saved state (created after skipping through the start screen), my thought is that some piece of information is not being overwritten during the state restore. It has graphical ramifications so my inclination is that it's constrained to the PPU, but it could be the CPU being corrupted and issuing garbage to the PPU. I've refactored the code for both PPU and CPU for the NES emulator to resolve some really bad smells that will make it easier to mess with the state restore code. Ultimately, it will be best to implement some very basic unit tests for the NES that ensure that state backup and restore operations function properly. @EliasHasle thanks again for diligence with this, your scripts will be a god-send for fixing this.

Elias Hasle · Answer 16 · Sat Oct 20 2018 13:42:40 GMT+0800 (China Standard Time)

No problem! Just remember to not start messing with other uses of python's random yet (e.g. to randomize start screen waiting time), because that will break the examples. Happy coding! :-)

Christian Kauten · Answer 17 · Fri Nov 02 2018 12:37:46 GMT+0800 (China Standard Time)

I've had no luck so far sadly. Time is tight these days, but I'll be working on the issue in spare time. Will certainly report back once things are fixed and deoyed. Anyone following this issue can check out the nes-py repository to follow progress (where the bug likely derives from) or experiment with potential solutions.

ian840512 · Answer 18 · Fri Dec 28 2018 20:44:22 GMT+0800 (China Standard Time)

hello! I want to ask this problem is solved?

Christian Kauten · Answer 19 · Sat Dec 29 2018 09:29:15 GMT+0800 (China Standard Time)

@ian840512 happy to inform that it is fixed now 👍. I'm pushing the new version rn. @EliasHasle thanks again for working out those debug scripts! They were way helpful to try fixes out. Closing issue as (hopefully) this is the final resolution.

Christian Kauten · Answer 20 · Sat Dec 29 2018 09:35:00 GMT+0800 (China Standard Time)

released as 5.0.1

ian840512 · Answer 21 · Sat Dec 29 2018 13:46:59 GMT+0800 (China Standard Time)

Wow!!!I'm just thinking that I need to use the ppaquette version.Thanks!!!!!!!!!!

Elias Hasle · Answer 22 · Sat Dec 29 2018 15:25:18 GMT+0800 (China Standard Time)

Great news! Thanks! 🥇 And merry Christmas to you too!

BTW, pip called the newest version 6.0.1. Is that right?

Christian Kauten · Answer 23 · Sat Dec 29 2018 15:38:25 GMT+0800 (China Standard Time)

And a happy new year! Yes 6.0.1 is correct. There was a minor change (that is still technically API breaking for somebody – life in the info dict is decremented by 1 now to be less mathy {1, ..., 3} and more compy sciy {0, ..., 2}), and another small fix.

Baek In-Chang · Answer 24 · Sat Dec 29 2018 18:42:18 GMT+0800 (China Standard Time)

When I run env.reset several times, I get a rendering error :(
Prior to the update, it was like the first screenshot of the issue thread, but after updating to 6.0.1, the same thing happens. Is there a solution?

Christian Kauten · Answer 25 · Sat Dec 29 2018 23:52:32 GMT+0800 (China Standard Time)

nope! happy coding to anyone that wants to give it a crack. I will not be returning to this issue.

Elias Hasle · Answer 26 · Sun Dec 30 2018 00:11:00 GMT+0800 (China Standard Time)

@bic4907 Does this happen often? Do you have a minimal script to reliably reproduce the issue, something like the one I provided above?

Christian Kauten · Answer 27 · Sun Dec 30 2018 00:54:35 GMT+0800 (China Standard Time)

The current bug is how this graphical glitch was before the backup/restore feature. Unless there is a bug in the emulator itself, i suspect this all relates to sloppy memory mgmt in the functionally decomposed NES emulator. I've tried refactoring it to OO to better handle creating and restoring game states and memory in general, but it's a nontrivial fix, particular in the CPU namespace bc the OG author uses C++ templates in a painfully esoteric fashion. A reproduction script will be helpful for whoever has time to try and fix this.

Elias Hasle · Answer 28 · Mon Dec 31 2018 15:52:32 GMT+0800 (China Standard Time)

@bic4907 Now I have seen a variant of that bug too. Graphics go all crazy, but Mario looks quite normal. Don't know how often it happens. @Kautenja Could it be related to the improvement of laiNES by which you avoid an extra screen copy per frame?

@Kautenja
I notice that with the newest version a lot of time is spent within the intro screen. I thought the intro screen should be kept out. Also, it doesn't look a lot like my agent playing. I suspect that the env is actually stuck in the intro screen or something. I am sorry to say that I think the old version was actually better.

Update: The second behavior described seems to appear only when running envs in multiple (supposedly independent) processes. Hm.

Elias Hasle · Answer 29 · Mon Dec 31 2018 17:54:38 GMT+0800 (China Standard Time)

It seems the actions may be reordered, which explains the poor performance of my prespecified policy. Using a newly optimized policy, the results look reasonable for a while. But, as you can see, the new graphical glitch is very much present (as well as frequent hangups). This run started with 16 independent processes, each with their own window, and over time all of them got the bug, and the five remaining ones hang too.

Behavior confirmed in 6.0.1 and 5.0.1, whereas 5.0.0 neither hangs nor has the red/black stripes bug, but has the bug described in the original post.

ian840512 · Answer 30 · Mon Dec 31 2018 23:57:42 GMT+0800 (China Standard Time)

If I use 5.0.0 version.Is there some ways to avoid this bug?
I see somebody can use this mario version and train correctly.
https://github.com/jcwleo/mario_rl

Elias Hasle · Answer 31 · Tue Jan 01 2019 00:02:08 GMT+0800 (China Standard Time)

The bug in 5.0.0 is a disturbance, but will probably not completely break all RL approaches. Noisy and erroneous data are common in ML.

My approach, inspired by Ha&Schmidhuber (https://worldmodels.github.io) involves converting frames to a low-dimensional embedding and working on those embeddings for sequential analysis. The bug in 5.0.0 may imprint some nearly static features in the autoencoder distribution, which is not desirable. (But H&S demonstrates a good example of learning from a noisy environment, namely one "hallucinated" from the dynamics learned on the embeddings.)

Christian Kauten · Answer 32 · Tue Jan 01 2019 05:45:25 GMT+0800 (China Standard Time)

Good news! I found a good object oriented NES implementation using C++ 11 that I'm porting as a new back-end for nes-py. Once that's done this bug should no longer be a problem over here :). 5.0.0 and 6.0.1 bugs occur at roughly the same rate and both resolve after some unknown number of episodes. The behavior in 5.0.0 is a normal game, but all background pixels are missing, so sprites still render fine. If we're talking CNN based techniques, this could influence the feature extractors to focus on sprite features as they will render naturally. However, it will also incline the agent to "memorize" the path as holes and platforms will likely not render. For 6.0.1, there is a game playing on top of another or something? But I imagine this would really confuse a CNN based solution and lead more towards memorizing paths. But as @EliasHasle mentioned, the noisy data in this area is common and could actually improve the performance through some unintuitive means! When bugs are resolved, it will be interesting to conduct a study as to how these bugs impact various RL techniques.

EDIT: a link

Elias Hasle · Answer 33 · Tue Jan 01 2019 08:02:29 GMT+0800 (China Standard Time)

I am happy to hear that you are not giving up on this project. I have just discovered openAI's Retro Gym, which also supports SMB. I will try that out too, but I don't know what to expect from such a general framework, in terms of cutscene removal and other game-specific optimizations. Perhaps a viable approach could be making a wrapper environment (around the retro environment) that skips uninteresting sequences and converts to enumerated controls etc, if necessary and sufficiently performant.

Baek In-Chang · Answer 34 · Tue Jan 01 2019 15:42:37 GMT+0800 (China Standard Time)

It seems the actions may be reordered, which explains the poor performance of my prespecified policy. Using a newly optimized policy, the results look reasonable for a while. But, as you can see, the new graphical glitch is very much present (as well as frequent hangups). This run started with 16 independent processes, each with their own window, and over time all of them got the bug, and the five remaining ones hang too.

Behavior confirmed in 6.0.1 and 5.0.1, whereas 5.0.0 neither hangs nor has the red/black stripes bug, but has the bug described in the original post.

@EliasHasle I have the same bug in that version. But I don't know how glitch happens.
When multiple threads were used for testing, almost all games eventually showed 'no response'.
Even in the random action sample code, this problem was found and appears to be in the random number of episodes.

Christian Kauten · Answer 35 · Tue Jan 01 2019 17:29:53 GMT+0800 (China Standard Time)

@EliasHasle I made up some academic reasons so I could feel better about spending time on it lol. Also, I ran into open ai retro shortly after porting LaiNES for this project (bit of a facepalm). It may be better or worse depending on how they set it up though.

@bic4907 if by threads you mean Python's abstraction in the threading.Thread package, that's because Thread in this case is still contained in the same process and the NES emulator's back-end currently has globally namespaced data structures. So, you'd need separate processes for each instance of the emulator using multiprocessing.Process. That will be resolved by the new back-end.

Baek In-Chang · Answer 36 · Tue Jan 01 2019 17:59:51 GMT+0800 (China Standard Time)

@Kautenja I used multiprocessing package for testing. This issue seems to occur in this sample code as well. My environment is Windows 10 64bit/pakage 6.0.1, Is it stable in Ubuntu? Or do you have a stable package version to try? Thank you for your effort :)

Christian Kauten · Answer 37 · Wed Jan 02 2019 02:47:59 GMT+0800 (China Standard Time)

Interesting, I've not tested much on Windows beyond ensuring that I could get it installed, compiled, and have all unit tests pass. Ubuntu and MacOS have worked pretty reliably for me with all versions, but I've been running serial processes. Does the issue occur simultaneously across all instances, like do they all fail at the same time? And if so, is any actuation of individual agents conditioned on a random seed and is the seed set to different values between processes? I'd not seen the issue in the sample script yet, does it occur after a certain number of steps? Also no problem!

Regarding stability, there is a really old version somewhere using a really slow NES back-end that is free of this glitch. Unfortunately, it has been a while since, and I have no clue which versions of gym-super-mario-bros and nes-py you'd need to match up to make that work. But, it is an option. It's a fork of the ppaquette version that has some improvements (including parallelism support). However, if you can wait around 2 weeks, I should have the new back-end up and running by then with (hopefully) no glitches, fast execution speeds, and maybe some bonus features :)

Elias Hasle · Answer 38 · Wed Jan 02 2019 03:11:46 GMT+0800 (China Standard Time)

I believe @bic4907 intends to say that it happens irrespective of multiprocessing. Quote:

When multiple threads were used for testing, almost all games eventually showed 'no response'.
Even in the random action sample code, this problem was found and appears to be in the random number of episodes.

Baek In-Chang · Answer 39 · Wed Jan 02 2019 18:48:52 GMT+0800 (China Standard Time)

@EliasHasle Yes, it also happends on single process (not using multiprocessing)

@Kautenja Testing in Windows 10 and Ubuntu 16.04 reveals that the process dies irregularly. These issues are frequently discovered before 500 episodes, and first, glitch occurs on the screen, then the backend dies after a while. I'll have to test it again with the new back-end.

Elias Hasle · Answer 40 · Wed Jan 02 2019 18:59:05 GMT+0800 (China Standard Time)

I have also experienced several hangups on sky blue screen, possibly without first getting the red-black stripe pattern glitch (but I am not sure).

Let's hope that with the new backend, either the problems are gone or they are much easier to debug. :-)

Elias Hasle · Answer 41 · Wed Jan 02 2019 22:52:10 GMT+0800 (China Standard Time)

Also, the red/black pattern is the most common appearance of the glitch. Sometimes it looks different, with red rings in a grid pattern on part or the screen etc., but seems to converge to the most common after a while.

Christian Kauten · Answer 42 · Thu Jan 03 2019 03:55:16 GMT+0800 (China Standard Time)

i've seen the blue screen as well, but cant reproduce that one atm. It's odd that the glitch produces this uniform pattern. @EliasHasle have you tried building nes-py from source with the fix you introduced to LaiNES? I haven't been by my workstation to properly inject that and try it out, but will be able to tomorrow afternoon

Elias Hasle · Answer 43 · Thu Jan 03 2019 19:25:37 GMT+0800 (China Standard Time)

No, I haven't tried that. I just thought it could be of interest to you, for your academic project... :-) Indeed, it will be best for a valid comparison between the buggy and bug-free version if they are based on the same back-end.

Elias Hasle · Answer 44 · Thu Jan 03 2019 20:56:02 GMT+0800 (China Standard Time)

Also, I ran into open ai retro shortly after porting LaiNES for this project (bit of a facepalm). It may be better or worse depending on how they set it up though.
Regarding stability, there is a really old version somewhere using a really slow NES back-end that is free of this glitch.

I believe OpenAI use a similar backend to the one you used before. Out of curiosity, I checked source file names for both, and OpenAI appears to use a variant of FCEUX, which is at least a relative of the FCEU you used. I am hoping to see some insane speedups with the new backend. :)

I don't know if this is a stupid question/request, but I wish for being able to limit the number of internal screen renderings of the emulator through the environment object, if that is possible. This would be in addition to skipping internal rendering during interaction frameskips (which I guess you are planning already, if it is at all possible). An application of this control could be for higher-order non-uniform frameskip during frame-sampling procedures or sticky actions etc.

Christian Kauten · Answer 45 · Fri Jan 04 2019 00:07:47 GMT+0800 (China Standard Time)

Hmm, yea if they're using FCEU they either have to have plugged into the cpp with ctypes (or whatever python to cpp framework they prefer) or written a lua script as a general purpose interface between retro and FCEU (how ppaquette gym-smb works). If it's the latter, you can expect it to be pretty slow. I can't speak for the former, but OpenAI surely has the engineering man hours to make that sort of change happen. I spent days trying to port FCEUX for use with ctypes before giving up because the codebase is huge and hard to follow.

Not a stupid request at all! I think some means of control for frame skipping is an excellent idea and one that is supported by the research. Ultimately, I think the best option (perhaps at the cost of some performance) is to remove frame-skipping from the environment entirely, and instead introduce this functionality using an external wrapper class. This way, users could design their own implementation of the feature without resorting to polymorphism or having to register tons of different environments with gym. It will also be easier to test and have better style since to me, frame skipping in the env is functional envy. Great idea!

Elias Hasle · Answer 46 · Fri Jan 04 2019 00:19:21 GMT+0800 (China Standard Time)

Retro connects directly to cpp. I have spent some time today trying to make a wrapper for retro that makes the experience very similar to the one with yours, borrowing some of your tricks for skipping busy frames: https://gist.github.com/EliasHasle/5958fceccbb10f8279c860caa8c31534
I have not compared performance yet, but it looks OK. The available memory interface requires reading and writing the whole state at once, which I guess is slower than modifying single addresses. It is still fast, but the narrow interface does not allow all the hacking I imagine one can do to improve performance further, e.g. deeper frame skipping (by which I mean sparing pure visualization calculations in the backend, if they exist. They are perhaps contained in the emulator's get_screen method already.).

Christian Kauten · Answer 47 · Sat Jan 05 2019 06:12:14 GMT+0800 (China Standard Time)

The new back-end is in place with all basic features :) I'll release the code sometime today under 3.0.0 hopefully with this bug completely resolved. I haven't run a true performance test yet, but this back-end seems quite a bit faster. It was definitely easier to work with! Will close issue for now, but wont be surprised to see it open back up lol.

Christian Kauten · Answer 48 · Sat Jan 05 2019 06:14:20 GMT+0800 (China Standard Time)

whoops it'll be version 6.0.2 actually. nes-py is the one at 3.0.0.

Christian Kauten · Answer 49 · Sat Jan 05 2019 06:23:12 GMT+0800 (China Standard Time)

using tqm to measure execution time:

from nes_py import NESEnv
import tqdm
env = NESEnv('./nes_py/tests/games/smb1.nes')

done = True

try:
    for _ in tqdm.tqdm(range(5000)):
        if done:
            state = env.reset()
            done = False
        else:
            state, reward, done, info = env.step(env.action_space.sample())
except KeyboardInterrupt:
    pass

I get 456 iterations per second using 2.0.0 of nes-py and 537 iterations per second using 3.0.0. About a 17% increase.

Elias Hasle · Answer 50 · Sun Jan 06 2019 17:50:34 GMT+0800 (China Standard Time)

A similar test with retro reveals that it performs about 3.5 times better, both with and without rendering to screen. I almost decided to use that instead, but it turned out to not support multiple emulator instances per process. I am happy to see that the newest version of nes-py is so flexible with regards to this. I am considering having multiple environment instances in each thread too, e.g. to send batches of frames from independent games throughthe same neural network. I may end up using both nes-py and retro, actually, for different things (must take care to have compatible reward functions if rewards will be used). Good job! :-)

Christian Kauten · Answer 51 · Mon Jan 07 2019 00:24:15 GMT+0800 (China Standard Time)

Yea I'm not sure quite yet what trickery gets retro up to such a high frame rate. I was able to squeak out another 100 FPS from nes-py just by rearranging the C++ code here and there, so it's likely that more free performance gains are hidden in the code. Will continue reading up on C++ and injecting optimizations where possible. It would be great to dethrone their emulator!