stepjam / RLBench

A large-scale benchmark and learning environment.

Home Page:https://sites.google.com/corp/view/rlbench

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug Report. gym wrapper returns np._bool causes error when using stable-baselines3

andykim0723 opened this issue · comments

I was trying to connect RLBench with stable-baselines3 and found a minor error.
I used the code from #103, and changed state to vision.

import gym
import rlbench.gym
import stable_baselines3.common.env_checker
from stable_baselines3 import PPO


env = gym.make('reach_target-vision-v0')

print(stable_baselines3.common.env_checker.check_env(env))
model = PPO('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=10000)

obs = env.reset()
for i in range(1000):
  action, _states = model.predict(obs, deterministic=True)
  obs, reward, done, info = env.step(action)
  print(obs.shape)
  print(reward)
  if done:
    obs = env.reset()

env.close()

When I run this code, such error occured from stable-baselines3 env checker:

assert isinstance(done, bool), "The `done` signal must be a boolean"
AssertionError: The `done` signal must be a boolean

This was due to the incompatibility bewteen np._bool and bool. In rlbench/gym/rlbench.env.py line107, terminate is a np._bool type, which makes isinstance(done, bool) False. To fix it, I simply typecasted and it works:

def step(self, action) -> Tuple[Dict[str, np.ndarray], float, bool, dict]: 
        obs, reward, terminate = self.task.step(action) 
        terminate = bool(terminate) 
        return self._extract_obs(obs), reward, terminate, {}