[Bug Report] Observation does not get updated in Atari multi-player games after the first player's action.

Question

[Bug Report] Observation does not get updated in Atari multi-player games after the first player's action.

Justkim opened this issue 3 months ago · comments

Describe the bug

It seems like after taking an action in an Atari game by the first player, the observation is the same as it was before taking the action. This could become problematic for reinforcement learning algorithms since the next state constructed from this observation may not show the consequence of the action taken. I have provided a code snippet that would check this issue by running an Atari environment, taking some actions and asserting if the observation is the same for the first player. If you run the code and you get no assertion error that means that the observation of the first player has not changed before and after taking the action while playing the game. Note that sometimes due to no-op or in the early stages of the game the observation may not change which makes sense, so the problem is why the observation is "always" the same before and after taking an action by the first player.

Code example

from pettingzoo.atari import space_invaders_v2
import numpy as np

env = space_invaders_v2.env()
env.reset(seed=42)
pre, reward, termination, truncation, info = env.last()

num_episodes = 10
episode = 0
for agent in env.agent_iter():

    if termination or truncation:
        episode += 1
        if episode > num_episodes:
            print("done")
            break
        env.reset()
        pre, reward, termination, truncation, info = env.last()
    else:
        action = env.action_space(agent).sample()  # first
        env.step(action)
        observation, reward, termination, truncation, info = env.last()
        if agent == "first_0":
            assert np.array_equal(pre, observation)
        pre = observation

env.close()

System info

Describe how PettingZoo was installed: Using pip
Version of pettingzoo: 1.24.3
What OS/version you're using: Ubuntu 22.04.1 LTS, Linux 6.5.0-15-generic, x86-64
Python Version: 3.11.0

Additional context

The provided code only checks Space Invaders but I have tried using other environments and got the same results. I also have tried doing the same thing for second player(called "second_0") and got an assertion error at some point which shows that the observation is changing after the second player's action as expected.

Checklist

I have checked that there is no similar issue in the repo

David Ackerman · Answer 1 · Mon May 06 2024 22:23:58 GMT+0800 (China Standard Time)

Hi, I haven't used the Atari envs so I'm not 100% sure this explanation is correct, but I think this is working as intended.

Note that last() is giving the observation of the agent about to move, so pre is what agent_0 saw before it moved, and observation is what agent_1 sees before it moves. Since this is based on a parallel env, there is no change in observation until all agents have moved. So those should be the same.

When you change this to agent_1, you get the assertion you expect because you are comparing what agent_1 saw (before all agents acted) to what agent_0 saw (after all agents acted, which triggered an observation update). Those should be different.

Alternatively, you can put pre = observation inside if agent == "first_0":. This will then compare the observation agent_0 sees now with the observation agent_0 saw last time. Those should be different, and will trigger the assert as you're expecting.

Kimiya Saadat · Answer 2 · Tue May 07 2024 05:35:44 GMT+0800 (China Standard Time)

Hi, I haven't used the Atari envs so I'm not 100% sure this explanation is correct, but I think this is working as intended.

Note that last() is giving the observation of the agent about to move, so pre is what agent_0 saw before it moved, and observation is what agent_1 sees before it moves. Since this is based on a parallel env, there is no change in observation until all agents have moved. So those should be the same.

When you change this to agent_1, you get the assertion you expect because you are comparing what agent_1 saw (before all agents acted) to what agent_0 saw (after all agents acted, which triggered an observation update). Those should be different.

Alternatively, you can put pre = observation inside if agent == "first_0":. This will then compare the observation agent_0 sees now with the observation agent_0 saw last time. Those should be different, and will trigger the assert as you're expecting.

Hi,
Thank you for your response.
My initial idea was that since I'm using an AEC environment, the observation and reward will get updated after each agent acts. I didn't realize that Atari is originally parallel and it gets converted from parallel to AEC. This explains why the observation doesn't get updated unless both of the agents have acted.

I'm using a reinforcement learning library(Tianshou) that only supports AEC environments in multi-agent settings. In the case of Atari games, this would result in collecting (state, action, reward, next_state) where the state is the same as next_state for the first agent. Is there a way to have an AEC Atari environment that updates observation and reward after each agent's action? Or is Atari originally parallel and I need to find another library or modify the library/algorithm I'm using the get the desired result?

David Ackerman · Answer 3 · Tue May 07 2024 06:42:03 GMT+0800 (China Standard Time)

I'm sorry, I don't have a good answer to your questions. The little exposure i have to Tianshou with multi-agent didn't work well. I haven't looked at it since. As far as I know all the Atari envs are originally parallel and wrapped to match the AEC interface. I do not know what would be involved in having them update after every agent move.

You might have better replies asking on the discord channel: https://discord.gg/nhvKkYa6qX