Customizable Reward Space
Kautenja opened this issue · comments
The reward space is statically defined in the super-mario-bros lua file. Is there a way to parameterize the different elements of the reward space to access through the Python API? For instance
- optional move right reward
- optional time penalty
- optional death penalty
- optional points reward
- optional coins reward
the new lockstep implementation supports this very naturally using environment keys. at setup the reward scheme can be specified and the Lua script can then construct the reward mapping function get_reward
dynamically.
A better idea is probably just to send all the streams of data to python and define gym wrappers there for specifying reward. This allows end users to define custom reward schemes and keeps things simple. It would only require about 12 bytes to send all streams of potential reward data.
Name | Range | bytes |
---|---|---|
score | [0, 999990] | 6 |
coins | [0, 99] | 1 |
lives | {-1, 0} | 1 |
x position | (-infty, infty) | 2 |
flagpole / axe get | {0, 1} | 1 |
time | [-300, 0] | 1 |
x position is realistically bounded in something like [0, 4000]. ppaquette's implementation has a table of the max x values for each of the 32 levels, but irrespective 2 bytes will hold the number no problem.
Though one byte can't encode [-300, 0] for time, there is no use case for a frame skip that causes time loss greater than a couple of ticks. It's realistically at worst [-5, 0] which we can put in a byte
The new all Python design makes this feature more easily doable. However, the additional complexity of the feature (how to integrate in a clean, un-intrusive, intuitive way) seems to outweigh potential benefits. Closing issue for now