Use With Vectorized Environments

Question

Use With Vectorized Environments

wbrenton opened this issue a year ago · comments

What are the best practices for use with a vectorized environment? Any help is appreciated thank you

Albin Cassirer · Answer 1 · Wed May 24 2023 17:08:59 GMT+0800 (China Standard Time)

Hey,

Not really sure what you are actually asking but the type of environment should not really have much impact on how you use Reverb.

wbrenton · Answer 2 · Wed May 24 2023 21:59:29 GMT+0800 (China Standard Time)

@acassirer Vectorized environments meaning you interact with a batch of environments every time you call .reset() or .step() on your environment api.

Here is a motivating example for why I think the question is worth while.

envs = make_envs(num_parallel_env=N, env_id="Breakout-v5")
obs = envs.reset()
print(obs.shape) # (N, 4, 86, 86)

# one trajectory writer for each env
trajectory_writers = [rb_client.trajectory_writer(num_keep_alive_refs=args.rollout_length) for _ in range(N)]
while True:
     next_obs, rewards, dones, infos = envs.step(actions)
     # next_obs.shape = (N, 4, 86, 86)
     # rewards = (N,) # scalar rewards
     
     # loop over every environment and write the experience to it's respective writer
     for idx in range(args.num_envs):
            trajectory_writer = trajectory_writers[idx]
            trajectory_writer.append({
                'obs': obs[idx],
                'actions': actions[idx],
                'rewards': rewards[idx],
                'dones': dones[idx]
            })
            if trajectory_writer.epsiode_steps >= 2:
                trajectory_writer.create_item(
                    table='uniform_experience_replay',
                    priority=1.,
                    trajectory={
                        'obs': trajectory_writer.history['obs'][:-1],
                        'next_obs': trajectory_writer.history['obs'][-1:],
                        'actions': trajectory_writer.history['actions'][:-1],
                        'rewards': trajectory_writer.history['rewards'][:-1],
                        'dones': trajectory_writer.history['dones'][:-1],
                })

Having to iterate over every environment is quite slow and defeats the purpose of using a vectorized environment. Surely there must be a better way, I'm just unable to find it in the codebase.

In case it's still not 100% clear what I'm looking for is a way to write a batch of experiences from N environments to the table without having to maintain a writer for each one of the N environments.