google-research / batch-ppo

Efficient Batched Reinforcement Learning in TensorFlow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OpenAI Retro environment support/example

darrellenns opened this issue · comments

This is a feature request to add support/examples for OpenAI Retro environments.

The Retro API is the same as the standard gym environments, however the observations consist of screen images. This probably means we need some sort of preprocessing/CNN/downsample stages.

For some environments, it may be necessary to build an input state using multiple past observed images in order to capture complete state (as things like velocity/acceleration of sprites are not captured in a single frame image).

Categorical actions is supported, please see agents.scripts.networks.feed_forward_categorical() as an example. To you CNNs, you can just modify that function.

You can stack multiple past frames using env = agents.tools.wrappers.FrameHistory(env, [0, 1, 10]), which combines the current, last, and frame from 10 steps ago.

You will have to set the chunk_length in your config to avoid training on full episodes at once, because they will likeliy not fit into GPU memory. You can set chunk_length = 1 if you don't use RNNs.

Thanks for the pointers @danijar - that is very helpful info. I've been poking around with it a bit over the past few days. I'm thinking something like this should work:

  • Add config options to allow use retro env in addition to standard gym. This requires an environment name and an initial state file. Maybe something like:
env = 'Riverraid-Atari2600'
env_type = 'retro' #set default to 'gym' for standard gym environments
env_state = 'Start.state'
  • Add a config option for FrameHistory wrapper
  • Add support for MultiBinary action spaces (the retro environments I've looked at all seem to use this)
  • Add a network suitable for retro environments (MultiBinary output and input layers/preprocessing suited to image data)

I think most of this should be pretty straightforward to implement. The part I'm a little unsure of right now is the best way to work with the MultiBinary action space. I'm thinking maybe it needs to be a sigmoid output from the network, and then convert it to binary in a post processing step with a greater-than threshold?

I'll put together a pull request if I can come up with something that is clean and works well.

The config.env value can be either a string or an environment constructor. If Retro environments can't be created via gym.make(name) then you should be able to use env = create_my_retro_env and define that function so that it returns an environment instance.

Adding an option for history frames to the config that defaults to None is a good idea. The wrapper should be applied in _create_environment() in scripts/train.py and scripts/visualize.py. We could also move that function into scripts/utility.py to avoid duplication.

The MultiBinary action space would be best parameterized by a tfd.Bernoulli distribution in the network definition. Sampling from that will give a bit vector with zeros and ones.