Kautenja / gym-super-mario-bros

An OpenAI Gym interface to Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The NES

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Customizable Reward Space

Kautenja opened this issue · comments

The reward space is statically defined in the super-mario-bros lua file. Is there a way to parameterize the different elements of the reward space to access through the Python API? For instance

  • optional move right reward
  • optional time penalty
  • optional death penalty
  • optional points reward
  • optional coins reward

the new lockstep implementation supports this very naturally using environment keys. at setup the reward scheme can be specified and the Lua script can then construct the reward mapping function get_reward dynamically.

A better idea is probably just to send all the streams of data to python and define gym wrappers there for specifying reward. This allows end users to define custom reward schemes and keeps things simple. It would only require about 12 bytes to send all streams of potential reward data.

Name Range bytes
score [0, 999990] 6
coins [0, 99] 1
lives {-1, 0} 1
x position (-infty, infty) 2
flagpole / axe get {0, 1} 1
time [-300, 0] 1

x position is realistically bounded in something like [0, 4000]. ppaquette's implementation has a table of the max x values for each of the 32 levels, but irrespective 2 bytes will hold the number no problem.

Though one byte can't encode [-300, 0] for time, there is no use case for a frame skip that causes time loss greater than a couple of ticks. It's realistically at worst [-5, 0] which we can put in a byte

The new all Python design makes this feature more easily doable. However, the additional complexity of the feature (how to integrate in a clean, un-intrusive, intuitive way) seems to outweigh potential benefits. Closing issue for now