Kautenja / gym-super-mario-bros

An OpenAI Gym interface to Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The NES

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Flag get detection

liziniu opened this issue · comments

Hi, I use the env of 'SuperMarioBros-1-1-v0' and hope to only use this stage.

However, sometimes the wrapper cannot detect a flag is obtained and will run into the next stage.

Is there any way to ensure the simulator will stay in a single stage?

I observe the variable self.ram[0x001D] and the video. When the Mario agent gets a flag, this variable is mostly 3, but occasionally 2.

I also observe the variable is 1 when the Mario is in a normal state. Can anyone explain the meaning of this variable is 2?

Hmm. could you run this shell command to print some version information? Issues with the flag get feature have occurred in the past so I want to make sure this is a new problem.

python3 -c 'import pkg_resources; \
print(pkg_resources.get_distribution("nes-py").version); \
print(pkg_resources.get_distribution("gym-super-mario-bros").version)'

Version aside, it looks like fixing this could be as simple as changing line 247 of smb_env.py:

@property
def _is_stage_over(self):
"""Return a boolean determining if the level is over."""
# iterate over the memory addresses that hold enemy types
for address in _ENEMY_TYPE_ADDRESSES:
# check if the byte is either Bowser (0x2D) or a flag (0x31)
# this is to prevent returning true when Mario is using a vine
# which will set the byte at 0x001D to 3
if self.ram[address] in _STAGE_OVER_ENEMIES:
# player float state set to 3 when sliding down flag pole
return self.ram[0x001D] == 3
return False

from:

                return self.ram[0x001D] == 3

to:

                return self.ram[0x001D] in {2, 3}

It looks like 0x001D is equal to 2 for more situations than just being on the flag pole, namely, when the flag pole first enters the scene I see 0x001D equal to 2 for a bust of frames, but never when Mario is on the pole. This does not disprove the bugs existance, but does negate the above solutions potential.

Also note the ram map description

0x001D

Player "float" state
0x00 - Standing on solid/else
0x01 - Airborn by jumping
0x02 - Airborn by walking of a ledge
0x03 - Sliding down flagpole

I run the shell. It outputs the following version information:

6.2.1

7.1.6

I know this is not the latest version, but I can't tell the difference of _is_stage_over between the lastest code and the code I use. Is there any special modification?

After many trials, I also note that 0x001D equal to 2 is not an indicator of flag getting.

My algorithm generates action sequences (and there is another video recorder program shows it occasionally goes into the next stage). What's worse, there is some stochastic factor (like randomly skip the frames) in my algorithm. So it's hard for me to manually reproduce this phenomenon, but I can provide the video recorder if this is helpful for you.

For my research study, I focus on SuperMarioBros-1-1-v0. Thus, I use the x_pos > 3155 as an indicator of flag getting.

Thanks!

closing issue as it seems resolved for now.