google-deepmind / reverb

Reverb is an efficient and easy-to-use data storage and transport system designed for machine learning research

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use With Vectorized Environments

wbrenton opened this issue · comments

What are the best practices for use with a vectorized environment? Any help is appreciated thank you

Hey,

Not really sure what you are actually asking but the type of environment should not really have much impact on how you use Reverb.

@acassirer Vectorized environments meaning you interact with a batch of environments every time you call .reset() or .step() on your environment api.

Here is a motivating example for why I think the question is worth while.

envs = make_envs(num_parallel_env=N, env_id="Breakout-v5")
obs = envs.reset()
print(obs.shape) # (N, 4, 86, 86)

# one trajectory writer for each env
trajectory_writers = [rb_client.trajectory_writer(num_keep_alive_refs=args.rollout_length) for _ in range(N)]
while True:
     next_obs, rewards, dones, infos = envs.step(actions)
     # next_obs.shape = (N, 4, 86, 86)
     # rewards = (N,) # scalar rewards
     
     # loop over every environment and write the experience to it's respective writer
     for idx in range(args.num_envs):
            trajectory_writer = trajectory_writers[idx]
            trajectory_writer.append({
                'obs': obs[idx],
                'actions': actions[idx],
                'rewards': rewards[idx],
                'dones': dones[idx]
            })
            if trajectory_writer.epsiode_steps >= 2:
                trajectory_writer.create_item(
                    table='uniform_experience_replay',
                    priority=1.,
                    trajectory={
                        'obs': trajectory_writer.history['obs'][:-1],
                        'next_obs': trajectory_writer.history['obs'][-1:],
                        'actions': trajectory_writer.history['actions'][:-1],
                        'rewards': trajectory_writer.history['rewards'][:-1],
                        'dones': trajectory_writer.history['dones'][:-1],
                })

Having to iterate over every environment is quite slow and defeats the purpose of using a vectorized environment. Surely there must be a better way, I'm just unable to find it in the codebase.

In case it's still not 100% clear what I'm looking for is a way to write a batch of experiences from N environments to the table without having to maintain a writer for each one of the N environments.