PWhiddy / PokemonRedExperiments

Playing Pokemon Red with Reinforcement Learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Frame cycles from pyboy inconsistent (w/ patch demo)

stangerm2 opened this issue · comments

The frame cycles from pyboy are inconsistent and reset mid rendering. This is concerning for creating a reliable observation feedback situation.

Here's the situation in picture's:

We start at oak, issue one down cmd, frames 0->19 happen and the 'final' frame is still in the next position (good so far):
image

Another down cmd, frames 0->16 look good but then jump to 20 (which doesn't match the terminal path output), regardless frames 20-24 jump back to the starting pos (bad/weird idk). The final frame is in the right place. Still this is a problem, what's in video memory shouldn't render this jump:
image

A few steps more down and we exit Oak's building. Frames 0-1 start to exit from pos map-x-y 40,5,11 then frames 2-34 jump to 0_5_11 (almost ok, we just exited right) but then there are still frames 17-21 at 40-5-11. It seems like x,y positions are jumping back and forth, somehow unsettled, and this matches my observations for more complicated runs, but hopefully this simple example gives some weight to the claim.

As an experienced embedded software dev I'm not a python expert but I do know memory and it really looks like the values being pulled from video/ram are not stable. Maybe I'm missing something but would love anyone to weight in what's going on. I lean to this maybe being a pybot bug, @Baekalfen what do you think?

Thanks

See the comment message to run this very easy demo patch off (f8a7fdf) to view yourself.

Demo Patch:

From 7776d90cfd3fdec8a7539cb69aa6b4d98ad2bd18 Mon Sep 17 00:00:00 2001
From: Matthew Stanger <stangerm2@gmail.com>
Date: Sat, 28 Oct 2023 13:54:36 -0700
Subject: [PATCH] Demo: Frame cycles from pyboy inconsistent

This is an issue because when training AI the game's video
output per 'turn' or frame completion cycle needs to be very
percise to create a perfect turn feedback observation.

This patch shows an issue where after keying off a frame
animation memory flag (ie. the sprite should be still & it's
animation has been completed) the screen output is stale or
reset or a starting frame.

The main meory key 0xC108: wSpritePlayerStateData1AnimFrameCounter
is used and when comparing this value on mGBA emulator the end
screen frame is whats expected.

Bug Demo Usage:
    python run_pretrained_interactive.py

Result:
The emulation will run 12 dpad down cmds starting from prof oak
and during this will capture every frame for each step. It saves
every frame by 'map_x_y' folder and screenshot by
'self.id_step-count_frame-number'.

You will see that the sprite in the screenshots moves down in
animation and then at the end of some frames jumps back to the
starting pos. This is the issue.
---
 baselines/agent_enabled.txt             |  2 +-
 baselines/red_gym_env.py                | 59 +++++++++++++++++--------
 baselines/run_pretrained_interactive.py | 16 ++++---
 3 files changed, 50 insertions(+), 27 deletions(-)

diff --git a/baselines/agent_enabled.txt b/baselines/agent_enabled.txt
index 7cfab5b..7ecb56e 100644
--- a/baselines/agent_enabled.txt
+++ b/baselines/agent_enabled.txt
@@ -1 +1 @@
-yes
+no
diff --git a/baselines/red_gym_env.py b/baselines/red_gym_env.py
index 5d23527..468abf1 100644
--- a/baselines/red_gym_env.py
+++ b/baselines/red_gym_env.py
@@ -108,7 +108,7 @@ class RedGymEnv(Env):
         self.screen = self.pyboy.botsupport_manager().screen()
 
         if not config['headless']:
-            self.pyboy.set_emulation_speed(6)
+            self.pyboy.set_emulation_speed(1)
             
         self.reset()
 
@@ -190,11 +190,13 @@ class RedGymEnv(Env):
     
     def step(self, action):
 
+        cur_map_n, cur_x_pos, cur_y_pos = self.read_m(0xD35E),  self.read_m(0xD362), self.read_m(0xD361)
+
         self.run_action_on_emulator(action)
         self.append_agent_stats(action)
 
         self.recent_frames = np.roll(self.recent_frames, 1, axis=0)
-        obs_memory = self.render()
+        obs_memory = self.render(reduce_res=False)
 
         # trim off memory from frame for knn index
         frame_start = 2 * (self.memory_height + self.mem_padding)
@@ -225,25 +227,42 @@ class RedGymEnv(Env):
 
         self.step_count += 1
 
+        ss_dir = self.s_path / Path(f'screenshots/{cur_map_n}_{cur_x_pos}_{cur_y_pos}')
+        ss_path = f'{id(self)}_final.jpeg'
+        plt.imsave(
+            Path(f'{ss_dir}/{ss_path}.jpeg'),
+            obs_memory)
+
         return obs_memory, new_reward*0.1, False, step_limit_reached, {}
 
     def run_action_on_emulator(self, action):
+        user_inputs = self.pyboy.get_input()
+        animation_started = False
+        cur_map_n, cur_x_pos, cur_y_pos = self.read_m(0xD35E),  self.read_m(0xD362), self.read_m(0xD361)
+
+        print(f'\naction: {WindowEvent(action).__str__()}, user_inputs: {str(user_inputs)[1:-1]},'
+              f' x:{cur_x_pos}, y:{cur_y_pos}, map: {cur_map_n}')
+
         # press button then release after some steps
-        self.pyboy.send_input(self.valid_actions[action])
+        self.pyboy.send_input(action)
         # disable rendering when we don't need it
         if not self.save_video and self.headless:
             self.pyboy._rendering(False)
-        for i in range(self.act_freq):
-            # release action, so they are stateless
-            if i == 8:
-                if action < 4:
-                    # release arrow
-                    self.pyboy.send_input(self.release_arrow[action])
-                if action > 3 and action < 6:
-                    # release button 
-                    self.pyboy.send_input(self.release_button[action - 4])
-                if self.valid_actions[action] == WindowEvent.PRESS_BUTTON_START:
-                    self.pyboy.send_input(WindowEvent.RELEASE_BUTTON_START)
+        for i in range(self.act_freq): # Frames for animation vary, xy move ~22, wall collision ~13 & zone reload ~66
+            self.save_screenshot(self.read_m(0xD362), self.read_m(0xD361), self.read_m(0xD35E), i)
+
+            # wSpritePlayerStateData1AnimFrameCounter, non-zero when sprite anim frames are playing
+            moving_animation = self.read_m(0xC108)
+            print(f'Moving: {moving_animation}, map: {self.read_m(0xD35E)}, frame: {i}')
+
+            if animation_started and moving_animation == 0:
+                break
+
+            # Release the key once the animation starts so it should only be possible to advance 1 pos.
+            if moving_animation > 0:
+                animation_started = True
+                self.pyboy.send_input(WindowEvent.RELEASE_ARROW_DOWN)
+
             if self.save_video and not self.fast_video:
                 self.add_video_frame()
             if i == self.act_freq-1:
@@ -535,11 +554,13 @@ class RedGymEnv(Env):
         
         return state_scores
     
-    def save_screenshot(self, name):
-        ss_dir = self.s_path / Path('screenshots')
-        ss_dir.mkdir(exist_ok=True)
+    def save_screenshot(self, x_pos_cur = 0, y_pos_cur = 0, n_map_cur = 0, i = 0):
+        ss_dir = self.s_path / Path(f'screenshots/{n_map_cur}_{x_pos_cur}_{y_pos_cur}')
+        ss_path = f'{id(self)}_{i}.jpeg'
+        print(f'path: {ss_dir}/{ss_path}')
+        ss_dir.mkdir(parents=True, exist_ok=True)
         plt.imsave(
-            ss_dir / Path(f'frame{self.instance_id}_r{self.total_reward:.4f}_{self.reset_count}_{name}.jpeg'), 
+            Path(f'{ss_dir}/{ss_path}.jpeg'),
             self.render(reduce_res=False))
     
     def update_max_op_level(self):
@@ -618,4 +639,4 @@ class RedGymEnv(Env):
             return map_locations[map_idx]
         else:
             return "Unknown Location"
-    
\ No newline at end of file
+    
diff --git a/baselines/run_pretrained_interactive.py b/baselines/run_pretrained_interactive.py
index 2c26713..5dc0067 100644
--- a/baselines/run_pretrained_interactive.py
+++ b/baselines/run_pretrained_interactive.py
@@ -30,9 +30,10 @@ if __name__ == '__main__':
 
     env_config = {
                 'headless': False, 'save_final_state': True, 'early_stop': False,
-                'action_freq': 24, 'init_state': '../has_pokedex_nballs.state', 'max_steps': ep_length, 
+                'action_freq': 120, 'init_state': '../has_pokedex_nballs.state', 'max_steps': ep_length,
                 'print_rewards': True, 'save_video': False, 'fast_video': True, 'session_path': sess_path,
-                'gb_path': '../PokemonRed.gb', 'debug': False, 'sim_frame_dist': 2_000_000.0, 'extra_buttons': True
+                'gb_path': '../PokemonRed.gb', 'debug': False, 'sim_frame_dist': 2_000_000.0, 'extra_buttons': True,
+                'use_screen_explore': False
             }
     
     num_cpu = 1 #64 #46  # Also sets the number of episodes per training iteration
@@ -47,7 +48,7 @@ if __name__ == '__main__':
     #keyboard.on_press_key("M", toggle_agent)
     obs, info = env.reset()
     while True:
-        action = 7 # pass action
+        action = 2 # Down D-Pad action
         try:
             with open("agent_enabled.txt", "r") as f:
                 agent_enabled = f.readlines()[0].startswith("yes")
@@ -55,8 +56,9 @@ if __name__ == '__main__':
             agent_enabled = False
         if agent_enabled:
             action, _states = model.predict(obs, deterministic=False)
-        obs, rewards, terminated, truncated, info = env.step(action)
-        env.render()
-        if truncated:
-            break
+        for i in range(12):
+            obs, rewards, terminated, truncated, info = env.step(action)
+        #env.render()
+        #if truncated:
+        break
     env.close()
-- 
2.34.1

As I am watching my sessions sometimes I notice battles end then go back to battle then end again so i can confirm something weird is happening.

I've no reason to believe this is a fault in PyBoy. If that was the case, I think I would have caught it in automated testing and manual testing. PyBoy doesn't really have the ability to randomly reset, unless you're issuing save/load state.

I don't have time to study the patch, but if I were you, I'd set a breakpoint() or input() after saving each frame to verify it's getting saved correctly. Maybe you're having some race-condition.

Hi! Thanks for sharing this with detailed info. I'm still having a bit of trouble understanding the issue here. Does this only happen when running with GUI? I don't think I've observed anything like what you're describing in my experiments.

Lets just close this for now. I don't think this is actionable at the moment.