Uniform random agent not moving in AntGather environment

Question

Uniform random agent not moving in AntGather environment

nikhilrayaprolu opened this issue 3 years ago · comments

I tried the example code provided in the README. the agent is not making any move from the initial position it was dropped, except it jitters its body a little.

import hrl_pybullet_envs
import gym
import numpy as np

env = gym.make('AntGatherBulletEnv-v0')
env.render()
ob = env.reset()
tot_rew = 0

for i in range(1000):
  # Take random actions
  ob, rew, done, _ = env.step(np.random.uniform(-1, 1, env.action_space.shape))
  tot_rew += rew

  if done: break

print(f'Achieved total reward of: {tot_rew}')

Sasha Abramowitz · Answer 1 · Fri Aug 27 2021 22:19:04 GMT+0800 (China Standard Time)

That's expected, it's performing random actions each time step, so on average it would likely stay in the same spot

Nikhil Rayaprolu · Answer 2 · Sat Aug 28 2021 13:03:02 GMT+0800 (China Standard Time)

But even in case of the PointGatherEnv there is no movement in the red cube on executing random actions

Sasha Abramowitz · Answer 3 · Sat Aug 28 2021 17:14:22 GMT+0800 (China Standard Time)

Change

env.step(np.random.uniform(-1, 1, env.action_space.shape))

to

env.step(np.ones(env.action_space.shape))

Nikhil Rayaprolu · Answer 4 · Sun Aug 29 2021 20:10:09 GMT+0800 (China Standard Time)

Thanks, @sash-a Can you also provide PointMazeEnv?

Nikhil Rayaprolu · Answer 5 · Sun Aug 29 2021 20:11:07 GMT+0800 (China Standard Time)

Also, some examples with a trained example would help in the provided Colab. Probably running a Stable baselines agent might be enough:
https://stable-baselines3.readthedocs.io/en/master/

Sasha Abramowitz · Answer 6 · Sun Aug 29 2021 20:26:51 GMT+0800 (China Standard Time)

PointMazeEnv may come in the future, but unfortunately I have pressing deadlines at the moment and it is not at the top of my list of envs to implement, AntPush and AntFall will likely come first.

The point of these environments is that they generally require hierarchical reinforcement learning to solve, so stable baselines likely would not cut it. Regardless that is beyond the scope of this repository, at the moment it is simply for my own research as I could not find non-mujoco versions of these envs and anyone that wants to use these envs is welcome, but I don't really have enough time to create baselines. You are more than welcome to try running some standard RL algorithms on these envs and see if they work and make some contributions like a benchmark.md, it would be much appreciated.

As a side not I did at one point try PPO on AntGather and it got a reward of 0, but that was a couple months ago and a very quick test.