Kautenja / gym-super-mario-bros

An OpenAI Gym interface to Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The NES

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Update step function in nes_env.py

Simple-Young opened this issue · comments

   in `nes_env.py` in `nes_py`,  function `step` return 4 params
   Returns:
        a tuple of:
        - state (np.ndarray): next frame as a result of the given action
        - reward (float) : amount of reward returned after given action
        - done (boolean): whether the episode has ended
        - info (dict): contains auxiliary diagnostic information
      
  but in `core.py` in `gym`, function `step` return 5 params
  Returns:
        observation (object): this will be an element of the environment's :attr:`observation_space`.
            This may, for instance, be a numpy array containing the positions and velocities of certain objects.
        reward (float): The amount of reward returned as a result of taking the action.
        terminated (bool): whether a `terminal state` (as defined under the MDP of the task) is reached.
            In this case further step() calls could return undefined results.
        truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.
            Typically a timelimit, but could also be used to indicate agent physically going out of bounds.
            Can be used to end the episode prematurely before a `terminal state` is reached.
        info (dictionary): `info` contains auxiliary diagnostic information (helpful for debugging, learning, and logging).
            This might, for instance, contain: metrics that describe the agent's performance state, variables that are
            hidden from observations, or individual reward terms that are combined to produce the total reward.
            It also can contain information that distinguishes truncation and termination, however this is deprecated in favour
            of returning two booleans, and will be removed in a future version.

I'm getting the same error here with the mismatch in expected values to unpack:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], line 11
      9     if done:
     10         state = env.reset()
---> 11     state, reward, done, info = env.step(env.action_space.sample())
     12     env.render()
     14 env.close()

File [~/anaconda3/lib/python3.10/site-packages/nes_py/wrappers/joypad_space.py:74](https://untitled+.vscode-resource.vscode-cdn.net/~/anaconda3/lib/python3.10/site-packages/nes_py/wrappers/joypad_space.py:74), in JoypadSpace.step(self, action)
     59 """
     60 Take a step using the given action.
     61 
   (...)
     71 
     72 """
     73 # take the step and record the output
---> 74 return self.env.step(self._action_map[action])

File [~/anaconda3/lib/python3.10/site-packages/gym/wrappers/time_limit.py:50](https://untitled+.vscode-resource.vscode-cdn.net/~/anaconda3/lib/python3.10/site-packages/gym/wrappers/time_limit.py:50), in TimeLimit.step(self, action)
     39 def step(self, action):
     40     """Steps through the environment and if the number of steps elapsed exceeds ``max_episode_steps`` then truncate.
     41 
     42     Args:
   (...)
     48 
     49     """
---> 50     observation, reward, terminated, truncated, info = self.env.step(action)
     51     self._elapsed_steps += 1
     53     if self._elapsed_steps >= self._max_episode_steps:

ValueError: not enough values to unpack (expected 5, got 4)

I am dealing with the same issue, it comes from newer gym versions and if you try to use gymnasium. they changed the expected values to include a truncated flag in the step function and an additional dictionary for state in the reset. you will run into the same issue with nes_py as well. I wanted to run an RL model in ray using this library but its about impossible without updating this library to gymnasium. You also need to get a version of nes_py that uses gymnasium which there does seem to be a valid open PR for that update to the nes_py repo that has not been approved yet.
i plan to attempt an update of this library to gymnasium format but python is still newer to me so hopefully someone else beats me to it.

I did manage to run the environment with gymnasium by doing the following:

import gym
import gym_super_mario_bros
from nes_py.wrappers import JoypadSpace
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
from gymnasium.wrappers import StepAPICompatibility, TimeLimit

env = gym.make('SuperMarioBros-v0')
steps = env._max_episode_steps  # get the original max_episode_steps count

env = JoypadSpace(env.env, SIMPLE_MOVEMENT)  # set the joypad wrapper
def gymnasium_reset(self, **kwargs):
    return self.env.reset()
    
# overwrite the old reset to accept `seeds` and `options` args
env.reset = gymnasium_reset.__get__(env, JoypadSpace)  

# set TimeLimit back
env = TimeLimit(StepAPICompatibility(env, output_truncation_bool=True), max_episode_steps=steps) 

A small explanation:

Gymnasium TimeLimit wrapper expects 5 arguments in return, so we need to wrap the environment in the StepAPICompatibility wrapper. So, to continue with the TimeLimit wrapper, we need to remove it first and wrap with everything that we want, e.g., JoypadSpace, and then wrap it with the StepAPICompatibilityand finally with the TimeLimit wrapper.

The JoypadSpace uses an old version of Gym's reset. So we need to overwrite it to be able to receive extra arguments. A small note, you will not be able to set seeds in have consistent runs across multiple runs with that solution.