sash-a / hrl_pybullet_envs

Locomotion HRL envs in pybullet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Uniform random agent not moving in AntGather environment

nikhilrayaprolu opened this issue · comments

I tried the example code provided in the README. the agent is not making any move from the initial position it was dropped, except it jitters its body a little.

import hrl_pybullet_envs
import gym
import numpy as np

env = gym.make('AntGatherBulletEnv-v0')
env.render()
ob = env.reset()
tot_rew = 0

for i in range(1000):
  # Take random actions
  ob, rew, done, _ = env.step(np.random.uniform(-1, 1, env.action_space.shape))
  tot_rew += rew

  if done: break

print(f'Achieved total reward of: {tot_rew}')

image

That's expected, it's performing random actions each time step, so on average it would likely stay in the same spot

But even in case of the PointGatherEnv there is no movement in the red cube on executing random actions

Change

env.step(np.random.uniform(-1, 1, env.action_space.shape))

to

env.step(np.ones(env.action_space.shape))

Thanks, @sash-a Can you also provide PointMazeEnv?

Also, some examples with a trained example would help in the provided Colab. Probably running a Stable baselines agent might be enough:
https://stable-baselines3.readthedocs.io/en/master/

PointMazeEnv may come in the future, but unfortunately I have pressing deadlines at the moment and it is not at the top of my list of envs to implement, AntPush and AntFall will likely come first.

The point of these environments is that they generally require hierarchical reinforcement learning to solve, so stable baselines likely would not cut it. Regardless that is beyond the scope of this repository, at the moment it is simply for my own research as I could not find non-mujoco versions of these envs and anyone that wants to use these envs is welcome, but I don't really have enough time to create baselines. You are more than welcome to try running some standard RL algorithms on these envs and see if they work and make some contributions like a benchmark.md, it would be much appreciated.

As a side not I did at one point try PPO on AntGather and it got a reward of 0, but that was a couple months ago and a very quick test.