Customizable Reward Space

Question

Customizable Reward Space

Kautenja opened this issue 6 years ago · comments

The reward space is statically defined in the super-mario-bros lua file. Is there a way to parameterize the different elements of the reward space to access through the Python API? For instance

Christian Kauten · Answer 1 · Fri Apr 27 2018 11:43:07 GMT+0800 (China Standard Time)

the new lockstep implementation supports this very naturally using environment keys. at setup the reward scheme can be specified and the Lua script can then construct the reward mapping function get_reward dynamically.

Christian Kauten · Answer 2 · Fri May 04 2018 06:50:36 GMT+0800 (China Standard Time)

A better idea is probably just to send all the streams of data to python and define gym wrappers there for specifying reward. This allows end users to define custom reward schemes and keeps things simple. It would only require about 12 bytes to send all streams of potential reward data.

Name	Range	bytes
score	[0, 999990]	6
coins	[0, 99]	1
lives	{-1, 0}	1
x position	(-infty, infty)	2
flagpole / axe get	{0, 1}	1
time	[-300, 0]	1

x position is realistically bounded in something like [0, 4000]. ppaquette's implementation has a table of the max x values for each of the 32 levels, but irrespective 2 bytes will hold the number no problem.

Though one byte can't encode [-300, 0] for time, there is no use case for a frame skip that causes time loss greater than a couple of ticks. It's realistically at worst [-5, 0] which we can put in a byte

Christian Kauten · Answer 3 · Wed Aug 22 2018 14:46:55 GMT+0800 (China Standard Time)

The new all Python design makes this feature more easily doable. However, the additional complexity of the feature (how to integrate in a clean, un-intrusive, intuitive way) seems to outweigh potential benefits. Closing issue for now