OpenAI Retro environment support/example
darrellenns opened this issue · comments
This is a feature request to add support/examples for OpenAI Retro environments.
The Retro API is the same as the standard gym environments, however the observations consist of screen images. This probably means we need some sort of preprocessing/CNN/downsample stages.
For some environments, it may be necessary to build an input state using multiple past observed images in order to capture complete state (as things like velocity/acceleration of sprites are not captured in a single frame image).
Categorical actions is supported, please see agents.scripts.networks.feed_forward_categorical()
as an example. To you CNNs, you can just modify that function.
You can stack multiple past frames using env = agents.tools.wrappers.FrameHistory(env, [0, 1, 10])
, which combines the current, last, and frame from 10 steps ago.
You will have to set the chunk_length
in your config to avoid training on full episodes at once, because they will likeliy not fit into GPU memory. You can set chunk_length = 1
if you don't use RNNs.
Thanks for the pointers @danijar - that is very helpful info. I've been poking around with it a bit over the past few days. I'm thinking something like this should work:
- Add config options to allow use retro env in addition to standard gym. This requires an environment name and an initial state file. Maybe something like:
env = 'Riverraid-Atari2600'
env_type = 'retro' #set default to 'gym' for standard gym environments
env_state = 'Start.state'
- Add a config option for FrameHistory wrapper
- Add support for MultiBinary action spaces (the retro environments I've looked at all seem to use this)
- Add a network suitable for retro environments (MultiBinary output and input layers/preprocessing suited to image data)
I think most of this should be pretty straightforward to implement. The part I'm a little unsure of right now is the best way to work with the MultiBinary action space. I'm thinking maybe it needs to be a sigmoid output from the network, and then convert it to binary in a post processing step with a greater-than threshold?
I'll put together a pull request if I can come up with something that is clean and works well.
The config.env
value can be either a string or an environment constructor. If Retro environments can't be created via gym.make(name)
then you should be able to use env = create_my_retro_env
and define that function so that it returns an environment instance.
Adding an option for history frames to the config that defaults to None
is a good idea. The wrapper should be applied in _create_environment()
in scripts/train.py
and scripts/visualize.py
. We could also move that function into scripts/utility.py
to avoid duplication.
The MultiBinary action space would be best parameterized by a tfd.Bernoulli
distribution in the network definition. Sampling from that will give a bit vector with zeros and ones.