robotology / gym-ignition

Framework for developing OpenAI Gym robotics environments simulated with Ignition Gazebo

Home Page:https://robotology.github.io/gym-ignition

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multiple resets before stepping makes observations junk

HorvathDawson opened this issue Β· comments

Description:

When multiple resets happen before the initial step the observation values become junk.

Example of bad observations:

[2.48752798e+10 2.48752798e+13 1.92311709e+10 1.92311709e+13
 1.05943662e+10 1.05943662e+13 1.73760567e+10 1.73760567e+13
 4.64308078e+09 4.64308078e+12]
[ 5.76040155e-01 -4.83243193e-01 -5.72812931e-01  3.71046713e+00
  1.72736902e-03  3.44344853e+00  3.04615830e-01  6.65370534e-01
  3.43991476e-03 -4.19276817e-01]
[ 0.57428055 -1.75960042 -0.57034015  2.47277984  0.00307352  1.34615452
  0.30532197  0.7061436   0.00305073 -0.38917982]
[ 1.30214752e+10  1.30214752e+13  2.32740358e+11  2.32740358e+14
  2.37263629e+11  2.37263629e+14  8.51912783e+10  8.51912783e+13
 -2.55717189e+10 -2.55717189e+13]

Example of expected observations:

[ 0.70883607  3.06453891 -0.1113847   1.32162828  0.11874101  0.86413771
  0.18686704 -0.34228004  0.0448376   0.46870266]
[ 0.71061966  1.78359314 -0.10892029  2.46441255  0.11872182 -0.01918816
  0.18653529 -0.33174444  0.04530129  0.46369175]
[ 0.71425176  3.63210301 -0.10729784  1.62245239  0.12016356  1.44174148
  0.18617237 -0.3629249   0.04576737  0.46607455]
[ 0.71870738  4.45561494 -0.10559593  1.70190148  0.12232909  2.16552533
  0.18579153 -0.38084443  0.04623357  0.46620485]
[ 0.72338892  4.6815451  -0.10484045  0.75548776  0.12440563  2.07653708
  0.18539936 -0.39216616  0.04670759  0.47402296]

Steps to reproduce

Note gym_bb is just our custom environment of gym-ignition. The repo containing the code can be found here.
https://github.com/Baesian-Balancer/gym-bb

import gym
import time
import functools
from gym_ignition.utils import logger
from gym_bb import randomizers

from gym_ignition.utils.typing import Action, Reward, Observation

env_id = "Monopod-Gazebo-v1"

def make_env_from_id(env_id: str, **kwargs) -> gym.Env:
    import gym
    import gym_bb
    return gym.make(env_id, **kwargs)



make_env = functools.partial(make_env_from_id, env_id=env_id)

env = randomizers.monopod.MonopodEnvRandomizer(env=make_env)
env.seed(42)

# This initial reset existing causes the bad observation.
# Removing the reset here makes it good again
observation = env.reset()

for epoch in range(1000):

    observation = env.reset()

    done = False

    while not done:
        action = env.action_space.sample()
        observation, reward, done, _ = env.step(action)
        print(observation)

env.close()
time.sleep(5)

Additional context

Multiple resets before stepping makes observations junk

Environment

  • OS: popOS 20.04
  • GPU: 1650 RTX
  • Python: 3.8.10
  • Version:
  • Channel:
    • Stable
  • Installation type:
    • User

If I followed well your code, the method that you call multiple times is MonopodBase.reset_task. I can see that here you call *.to_gazebo.reset_* methods. After these methods, a gazebo.run(paused=True)` is necessary in order to update the state of simulator. This cannot be done from the task since it should not control the simulator itself, it is not designed to do so.

In general, considering how you structured your environment (the intended way πŸ˜‰) you should try to avoid any to_gazebo call in your task class. This would make your class fully compatible with all ScenarIO backends (that currently is only Gazebo, but this is the right mindset to get engine-agnostic tasks which is one of the desiderata of the project).

I think that a possible fix would be moving this randomization to... the randomizer, that is where it should belong. The randomizer, instead, does have access to the GazeboSimulator object and can reset the model to the desired position and velocity before env.reset (which calls in sequence Task.reset_task and then Task.get_observation) is called. You don't have to do this yourself since it is already done in:

ok_paused_run = self.env.gazebo.run(paused=True)

So, to recap, if I am right, you can solve by moving this logic after these lines.

Hello @diegoferigo,

Thank you so much for the very thorough answer. It has helped me understand the intended structure a lot better.

I just changed my environment to have the reset in the randomizer instead. However, the weird observation values still happen when there are multiple resets before stepping happens. When I change the location of the second reset to be after the epoch like this,
image

The issue does not persist. It seems anytime there are 2 resets before the first step is the only time this issue happens. It isn't a very bad bug (except being hard to find).

I have a few follow ups about how to structure the environments to work on a real robot / the recommended way for me to implement my own scenarIO back end for my robot. However, I will move this over to the github discussions.

Strange behavior, I'm not really sure who to blame :) I tried on my setup that is based on Ignition Fortress + our devel branch and I get the following:

Script
import gym
import time
import functools
from gym_ignition.utils import logger
from gym_bb import randomizers

from gym_ignition.utils.typing import Action, Reward, Observation

env_id = "Monopod-Gazebo-v1"

def make_env_from_id(env_id: str, **kwargs) -> gym.Env:
    import gym
    import gym_bb
    return gym.make(env_id, **kwargs)



make_env = functools.partial(make_env_from_id, env_id=env_id)

env = randomizers.monopod.MonopodEnvRandomizer(env=make_env)
env.seed(42)

# Try to reset multiple times
print(env.reset())
print(env.reset())
print(env.reset())
print(env.reset())
print(env.reset())
print(env.reset())
print(env.reset())
print(env.reset())
print(env.reset())
print(env.reset())
gym-bb on ξ‚  main via 🐍 v3.8.10 πŸ…’ /conda  took 5s 
✦ ❯ ipython script.py
INFO: Making new env: Monopod-Gazebo-v1 ({'physics_engine': 0})
[Wrn] [ServerConfig.cc:860] IGN_GAZEBO_SERVER_CONFIG_PATH set but no file found, no plugins loaded
WARN: Box bound precision lowered by casting to float64
[-0.4608653  -0.01303166  0.4608653  -0.04270197 -0.0042243  -0.04540492
  0.30490549  0.04584364 -0.00212186 -0.03880929]
[-0.32561324 -0.03355728  0.32561324 -0.00463403  0.00401983 -0.03226874
  0.29692544  0.02540731  0.0045288  -0.04842035]
[ 0.37949755 -0.03606021 -0.37949755 -0.01683971 -0.00187991 -0.01650839
  0.30255707  0.04046996 -0.00129108 -0.02424057]
[-0.00804839  0.04977905  0.00804839  0.02675372  0.00226717 -0.0341412
  0.30092956 -0.02311185 -0.00035272 -0.04039219]
[ 5.18247062e-01 -1.19754151e-02 -5.18247062e-01  6.42832349e-03
 -2.45487448e-03 -1.03810239e-02  2.96015569e-01  4.23681635e-02
 -3.95638763e-03  2.18586356e-04]
[ 0.55658223  0.01573295 -0.55658223 -0.01650758 -0.00189508 -0.03451867
  0.30277819 -0.03263973 -0.00453006 -0.02803389]
[-7.90700858e-02  2.79145583e-02  7.90700858e-02 -3.65495938e-03
  2.11424871e-03  2.26854122e-02  3.04595148e-01 -8.03847243e-03
  1.37859649e-04  1.12630982e-02]
[-0.53846333  0.04530475  0.53846333 -0.00717272 -0.00228832  0.00621186
  0.29886509  0.03637921 -0.00247402 -0.04152875]
[-1.18975457e-01 -3.37652576e-03  1.18975457e-01 -3.58842517e-02
 -5.03471231e-05 -1.80997566e-02  3.01628825e-01  1.45664316e-03
  3.00660125e-03 -4.49255152e-02]
[-0.19562574 -0.02750057  0.19562574 -0.03728941 -0.00491874  0.00266767
  0.2960702   0.03454619  0.00497083 -0.03656328]

which seems ok, right?

@diegoferigo Yes that seems correct. I just tried that same script on my setup and got the same results. However after modifying the script a bit I found the minimum example to reproduce the bad behaviour.

import gym
import functools
from gym_bb import randomizers

env_id = "Monopod-v1"


def make_env_from_id(env_id: str, **kwargs) -> gym.Env:
    import gym
    import gym_bb
    return gym.make(env_id, **kwargs)


make_env = functools.partial(make_env_from_id, env_id=env_id)

env = randomizers.monopod.MonopodEnvRandomizer(
    env=make_env, reward_class_name='BalancingV1')
env.seed(42)

# Try to reset multiple times
action = env.action_space.sample()
print(env.reset())
print(env.step(action))
print(env.reset())
print(env.reset())
print(env.step(action))
print(env.reset())
print(env.step(action))
print(env.step(action))

which gave this output

[0.  0.  0.3 0.  0.  0.  0.  0.3 0.  0. ]
(array([-1.15407057e-06, -1.36305522e-13,  3.00293390e-01, -8.77843805e-17,
        1.33360434e-13, -1.15407057e-03, -1.36305522e-10,  2.93390090e-01,
       -8.77843805e-14,  1.33360434e-10]), 0.30029339009044886, False, {})
[0.  0.  0.3 0.  0.  0.  0.  0.3 0.  0. ]
[0.  0.  0.3 0.  0.  0.  0.  0.3 0.  0. ]
(array([-1.07746080e-02,  5.44272987e-03,  3.00654145e-01, -3.60783009e-04,
       -6.41573538e-03, -1.07746080e+01,  5.44272987e+00,  6.54144668e-01,
       -3.60783009e-01, -6.41573538e+00]), 0.30065414466836143, False, {})
[0.  0.  0.3 0.  0.  0.  0.  0.3 0.  0. ]
(array([-1.53336308e-02,  4.97875066e-04,  3.00904667e-01,  1.52682613e-04,
       -1.49304940e-02, -1.53336308e+01,  4.97875066e-01,  9.04667237e-01,
        1.52682613e-01, -1.49304940e+01]), 0.3009046672371765, False, {})
WARN: The observation does not belong to the observation space
(array([ 1.07756050e+12,  2.58790241e+11,  2.14876645e+10, -5.72973833e+09,
        1.86600059e+12,  1.07756050e+15,  2.58790241e+14,  2.14876645e+13,
       -5.72973833e+12,  1.86600059e+15]), 21487664492.98367, True, {})

I am very confused with what is happening here.

I updated to v1.3.0 and ignition fortress and the results are worse. I can not render the enviroment because of #402 to make sure everything is running okay still but after running the above script again on the new version I got the results,

[Wrn] [ServerConfig.cc:860] IGN_GAZEBO_SERVER_CONFIG_PATH set but no file found, no plugins loaded
WARN: Box bound precision lowered by casting to float64
[0.  0.  0.3 0.  0.  0.  0.  0.3 0.  0. ]
(array([-1.15415634e-06, -1.36304940e-13,  3.00293390e-01, -8.77607449e-17,
        1.33359213e-13, -1.15415634e-03, -1.36304940e-10,  2.93390091e-01,
       -8.77607449e-14,  1.33359213e-10]), 0.30029339009134215, False, {})
[0.  0.  0.3 0.  0.  0.  0.  0.3 0.  0. ]
[0.  0.  0.3 0.  0.  0.  0.  0.3 0.  0. ]
WARN: The observation does not belong to the observation space
(array([-2.87089631e+11,  3.22248969e+10,  1.18477461e+10,  3.16331085e+09,
       -2.71644090e+11, -2.87089631e+14,  3.22248969e+13,  1.18477461e+13,
        3.16331085e+12, -2.71644090e+14]), 11847746128.844805, True, {})
[0.  0.  0.3 0.  0.  0.  0.  0.3 0.  0. ]
(array([-3.08154667e+11,  3.42800180e+10,  1.18057544e+10,  3.15218282e+09,
       -2.90660812e+11, -3.08154667e+14,  3.42800180e+13,  1.18057544e+13,
        3.15218282e+12, -2.90660812e+14]), 11805754354.421728, True, {})
(array([ 3.43994049e+22, -1.00836623e+22, -6.41627739e+14, -1.22564216e+15,
        1.33670549e+22,  3.43994049e+25, -1.00836623e+25, -6.41639545e+17,
       -1.22564531e+18,  1.33670549e+25]), -641627739487769.9, True, {})

I created a clean ubuntu focal system by executing the following commands in a docker container

# Start the container with: docker run -it ubuntu:focal bash

apt update
export IGNITION_DISTRIBUTION="fortress"
export IGNITION_DEFAULT_CHANNEL="stable"
apt install virtualenv wget lsb-release gnupg2 git
echo "deb http://packages.osrfoundation.org/gazebo/ubuntu-${IGNITION_DEFAULT_CHANNEL} `lsb_release -cs` main" > \
    /etc/apt/sources.list.d/gazebo-${IGNITION_DEFAULT_CHANNEL}.list
wget http://packages.osrfoundation.org/gazebo.key -qO - | apt-key add -
apt update
apt install ignition-fortress

virtualenv /tmp/venv
source /tmp/venv/bin/activate
pip install -U pip

pip install git+https://github.com/Baesian-Balancer/gym-bb
pip install ipython
pip install -U "gym-ignition==1.3.0" "scenario==1.3.0"

sed -i "s|from . import monitor|# from . import monitor|g" /tmp/venv/lib/python3.8/site-packages/gym_bb/__init__.py

And then executing the script (running it multiple times yield reproducible results):

import gym
import time
import functools
from gym_ignition.utils import logger
from gym_bb import randomizers

from gym_ignition.utils.typing import Action, Reward, Observation

env_id = "Monopod-Gazebo-v1"

def make_env_from_id(env_id: str, **kwargs) -> gym.Env:
    import gym
    import gym_bb
    return gym.make(env_id, **kwargs)

make_env = functools.partial(make_env_from_id, env_id=env_id)

env = randomizers.monopod.MonopodEnvRandomizer(env=make_env)
env.seed(42)

# Try to reset multiple times
action = env.action_space.sample()
print(env.reset())
print(env.step(action))
print(env.reset())
print(env.reset())
print("===>")
print(env.step(action))
print("<===")
print(env.reset())
print(env.step(action))
print(env.step(action))

Output:

[-0.4608653  -0.01303166  0.4608653  -0.04270197 -0.0042243  -0.04540492
  0.30490549  0.04584364 -0.00212186 -0.03880929]
(array([-4.60865300e-01, -5.30130273e-10,  4.60865300e-01, -9.42064204e-11,
       -4.22430323e-03, -2.22622365e-10,  3.04943663e-01,  3.81699690e-02,
       -2.16212358e-03, -4.02639642e-02]), 8.704630994507406, False, {})
[-0.32561324 -0.03355728  0.32561324 -0.00463403  0.00401983 -0.03226874
  0.29692544  0.02540731  0.0045288  -0.04842035]
[ 0.37949755 -0.03606021 -0.37949755 -0.01683971 -0.00187991 -0.01650839
  0.30255707  0.04046996 -0.00129108 -0.02424057]
===>
WARN: The observation does not belong to the observation space
(array([-1.88644495e+09, -1.88644495e+12, -3.55188390e+10, -3.55188390e+13,
       -1.68572931e+10, -1.68572931e+13,  2.60987435e+07,  2.60987432e+10,
        6.60888064e+07,  6.60888064e+10]), 1.2805865140623118e-11, True, {})
<===
[-0.00804839  0.04977905  0.00804839  0.02675372  0.00226717 -0.0341412
  0.30092956 -0.02311185 -0.00035272 -0.04039219]
(array([-4.87550921e+10, -4.87550921e+13, -8.57572631e+11, -8.57572631e+14,
       -3.89485695e+11, -3.89485695e+14,  1.63443832e+10,  1.63443832e+13,
       -1.15210537e+09, -1.15210537e+12]), 2.044843067068891e-14, True, {})
(array([ 6.14321713e+17,  6.14321762e+20,  2.78311246e+22,  2.78311246e+25,
        7.79909646e+17,  7.79910036e+20, -7.52601307e+15, -7.52602941e+18,
        1.91471666e+15,  1.91471781e+18]), 4.440814240705095e-20, True, {})

I couldn't visualize the environment from the container I created on the fly, but the simulation is indeed exploding. I suspect it depends on the randomized state from which the model is initialized. Are you sure there are no configuration in which the model is initialized penetrating the ground? Of course, in this scenario, it would receive a huge reaction force and the simulation makes sense that it explodes. After this look, it seems that it does not depend on gym-ignition / scenario, rather the implementation of the environment.

I have tried isolating the problem using the above idea making the monopod 100% impossible to penetrate the ground. I also have a new version of our environment which has completely changed a lot of the code base from the current implementation including no reset randomization that still has this issue.

The extra confusing part is that when you remove the extra reset everything is fine again, no matter how many episodes of training you do. This makes me believe that it isn't clipping due to the randomizer or something with the main logic of the environment. There must be some weird underlying condition that gets changed with the order of resets..

I have dug into my code base pretty thoroughly and can't find the culprit. I think we should close this issue for now and if I find the cause I will followup in this same thread. :)

Thank you as always @diegoferigo

Sure, feel free to open this issue again if needed. Closing.