Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Home Page:https://gymnasium.farama.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug Report] RecordVideo causing segmentation fault for Mujoco environments

declanoller opened this issue · comments

Describe the bug

I installed mujoco==3.1.3 from pip. I had gymnasium==0.29.1 from pip as well, but I was getting the solver_iter bug referenced here. So I upgraded to the most recent gymnasium with pip install gymnasium==1.0.0a1, which solved it.

Now I'm trying to record videos and getting a segfault. I put several code examples below to show what works and what doesn't but here's the summary:

  • non-mujoco envs (classic control or Box2D) record video just fine
  • running a mujoco env with no render_mode argument runs fine (no video though obviously)
  • running a mujoco env with render_mode="human" runs fine (also no video)
  • running a mujoco env with render_mode="rgb_array" and the RecordVideo wrapper produces a segfault

The error isn't very informative:

$ python gymnasium_record_video.py 
/home/declan/.local/lib/python3.10/site-packages/gymnasium/wrappers/rendering.py:282: UserWarning: WARN: Overwriting existing videos at /home/declan/Videos folder (try specifying a different `video_folder` for the `RecordVideo` wrapper if this is not desired)
  logger.warn(
Segmentation fault (core dumped)

Code example

# ##################### Works:

import gymnasium as gym
env = gym.make("HalfCheetah-v5")

env.reset()
done = False
while not done:
    next_state, reward, terminated, truncated, info = env.step(env.action_space.sample())
    if terminated or truncated:
        break

env.close()

# ##################### Works:

import gymnasium as gym
env = gym.make("HalfCheetah-v5", render_mode="human")

env.reset()
done = False
while not done:
    next_state, reward, terminated, truncated, info = env.step(env.action_space.sample())
    if terminated or truncated:
        break

env.close()

# ##################### Works, makes video:

import gymnasium as gym
env = gym.make("CartPole-v1", render_mode="rgb_array")
env = gym.wrappers.RecordVideo(
    env=env,
    video_folder="/home/declan/Videos/",
)

env.reset()
done = False
while not done:
    next_state, reward, terminated, truncated, info = env.step(env.action_space.sample())
    if terminated or truncated:
        break

env.close()

# ##################### Segfaults with error above:

import gymnasium as gym
env = gym.make("HalfCheetah-v5", render_mode="rgb_array")
env = gym.wrappers.RecordVideo(
    env=env,
    video_folder="/home/declan/Videos/",
)

env.reset()
done = False
while not done:
    next_state, reward, terminated, truncated, info = env.step(env.action_space.sample())
    if terminated or truncated:
        break

env.close()

System info

$ pip freeze | grep mujoco
mujoco==3.1.3

$ pip freeze | grep gymnasium
gymnasium==1.0.0a1

$ python --version
Python 3.10.12

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.4 LTS
Release:	22.04
Codename:	jammy

Additional context

No response

Checklist

  • I have checked that there is no similar issue in the repo

Thanks for reporting, this is surprising and I imagine that the seg fault must exist outside of Gymnasium as we do anything in python where the error seems to be from c++ given the error message.

Booting up my Ubuntu machine, I can replicate all your conditions, however, I don't get the seg fault :(

Could you try removing the video recorder but with the render mode of rgb array?
If not, then there must be something weird between the video recorder and mujoco interacting.

Given the seg fault, it would be helpful to know when this occurs
Could you find out if it errors on env construction, reset, first step, 1000th steps (the environment will end after 1000 steps which will cause the video to be rendered), close?

I would also test using different mujoco backends, see https://github.com/google-deepmind/dm_control?tab=readme-ov-file#rendering for how to change this. It might be an issue with only OpenGL

Good luck with the debugging

@Kallinteris-Andreas Any more ideas?

does this work?

import gymnasium as gym
env = gym.make("HalfCheetah-v5", render_mode="rgb_array")

env.reset()
done = False
while not done:
    next_state, reward, terminated, truncated, info = env.step(env.action_space.sample())
    frame = env.render()
    if terminated or truncated:
        break

env.close()

Thanks for reporting, this is surprising and I imagine that the seg fault must exist outside of Gymnasium as we do anything in python where the error seems to be from c++ given the error message.

Booting up my Ubuntu machine, I can replicate all your conditions, however, I don't get the seg fault :(

Could you try removing the video recorder but with the render mode of rgb array? If not, then there must be something weird between the video recorder and mujoco interacting.

Given the seg fault, it would be helpful to know when this occurs Could you find out if it errors on env construction, reset, first step, 1000th steps (the environment will end after 1000 steps which will cause the video to be rendered), close?

I would also test using different mujoco backends, see https://github.com/google-deepmind/dm_control?tab=readme-ov-file#rendering for how to change this. It might be an issue with only OpenGL

Good luck with the debugging

@Kallinteris-Andreas Any more ideas?

Hi guys, thanks for the fast responses! Here are some updates to answer your questions:

  • I didn't have env.render() in the code examples above, but I added it in the loop (more on that below)
  • I added a step counter so we can see where it crashes, and added prints before and after the env.close()

Below are the tests and results:

render_mode="rgb_array", env.render() in loop, record video:

import gymnasium as gym
env = gym.make("HalfCheetah-v5", render_mode="rgb_array")
env = gym.wrappers.RecordVideo(
    env=env,
    video_folder="/home/declan/Videos/",
)

env.reset()
done = False
i_step = 0
print("Begin loop")
while not done:
    next_state, reward, terminated, truncated, info = env.step(env.action_space.sample())
    frame = env.render()
    print(f"\t{i_step = }, post-render")
    i_step += 1
    if terminated or truncated:
        break

print("Closing env...")
env.close()
print("Env closed!")

segfaults at env.close().

render_mode="rgb_array", env.render() in loop, NO record video:

import gymnasium as gym
env = gym.make("HalfCheetah-v5", render_mode="rgb_array")

env.reset()
done = False
i_step = 0
print("Begin loop")
while not done:
    next_state, reward, terminated, truncated, info = env.step(env.action_space.sample())
    frame = env.render()
    print(f"\t{i_step = }, post-render")
    i_step += 1
    if terminated or truncated:
        break

print("Closing env...")
env.close()
print("Env closed!")

does NOT segfault.

I tried the different backends by putting

import os
os.environ["MUJOCO_GL"] = "glfw"  # or "egl" or "osmesa"

at the top. "glfw" is the default, so that still segfaults. However, both "egl" and "osmesa" work and produce video with no segfault, whoooo!

So that seems to be a fix at least. A few questions / follow up things:

  • I notice that my script above produced video (with the working backends) whether or not I actually had env.render() commented or not. Do I need it then?
  • I notice that on the mujoco rendering page you linked, it says to install sudo apt-get install libglfw3 libglew2.0. I checked that I have these, and I have libglfw3 version 3.3.6-1, but not libglew2.0 -- but dpkg -l says I have libglew2.2 (note: a separate package from libglew2.0 I think). So could that be the problem?
  • I have another machine I use, so I'll see if it has the same problem there.

What operating system are you using? Are you using x11 or Wayland?

What operating system are you using? Are you using x11 or Wayland?

Ubuntu 22.04 and

$ echo $XDG_SESSION_TYPE
wayland

I ran it on my other machine, which has all the same versions of everything mentioned above, and get the same exact results: segfault when recording video with default backend, video works with "egl" or "osmesa".

Please check if it works with x11 and create issue at MuJoCo
Closing as this is not a gymnasium issue.

@Kallinteris-Andreas There might still be an issue between VideoRecorder and MuJoCo which would be a Gymnasium issue.

@declanoller Have you run the same code without the VideoRecorder (and rgb array)? If this segfaults, then there must be a weird issue between moviepy and mujoco glfw is my guess
I'll reopen if this is the case.

On the video recorder not working if env.render is called is strange, I'll investigate that

env = RecordVideo(gym.make("CartPole-v1", render_mode="rgb_array"), "videos")
env.reset()
while True:
    obs, reward, terminated, truncated, info = env.step(env.action_space.sample())

    # testing if this causes an issue (and a video is recorded at the end)
    render_out = env.render()
    assert isinstance(render_out, np.ndarray)

    if terminated or truncated:
        break

env.close()

assert len(os.listdir("videos")) == 1
shutil.rmtree("videos")

I don't seem to be able to reproduce the issue for the following script