Flag get detection
liziniu opened this issue · comments
Hi, I use the env of 'SuperMarioBros-1-1-v0' and hope to only use this stage.
However, sometimes the wrapper cannot detect a flag is obtained and will run into the next stage.
Is there any way to ensure the simulator will stay in a single stage?
I observe the variable self.ram[0x001D]
and the video. When the Mario agent gets a flag, this variable is mostly 3, but occasionally 2.
I also observe the variable is 1 when the Mario is in a normal state. Can anyone explain the meaning of this variable is 2?
Hmm. could you run this shell command to print some version information? Issues with the flag get feature have occurred in the past so I want to make sure this is a new problem.
python3 -c 'import pkg_resources; \
print(pkg_resources.get_distribution("nes-py").version); \
print(pkg_resources.get_distribution("gym-super-mario-bros").version)'
Version aside, it looks like fixing this could be as simple as changing line 247 of smb_env.py:
gym-super-mario-bros/gym_super_mario_bros/smb_env.py
Lines 237 to 249 in 9154ece
from:
return self.ram[0x001D] == 3
to:
return self.ram[0x001D] in {2, 3}
It looks like 0x001D is equal to 2 for more situations than just being on the flag pole, namely, when the flag pole first enters the scene I see 0x001D equal to 2 for a bust of frames, but never when Mario is on the pole. This does not disprove the bugs existance, but does negate the above solutions potential.
Also note the ram map description
0x001D
Player "float" state
0x00 - Standing on solid/else
0x01 - Airborn by jumping
0x02 - Airborn by walking of a ledge
0x03 - Sliding down flagpole
I run the shell. It outputs the following version information:
6.2.1
7.1.6
I know this is not the latest version, but I can't tell the difference of _is_stage_over
between the lastest code and the code I use. Is there any special modification?
After many trials, I also note that 0x001D equal to 2 is not an indicator of flag getting.
My algorithm generates action sequences (and there is another video recorder program shows it occasionally goes into the next stage). What's worse, there is some stochastic factor (like randomly skip the frames) in my algorithm. So it's hard for me to manually reproduce this phenomenon, but I can provide the video recorder if this is helpful for you.
For my research study, I focus on SuperMarioBros-1-1-v0
. Thus, I use the x_pos > 3155
as an indicator of flag getting.
Thanks!
closing issue as it seems resolved for now.