Issue(HER with in SAC algorithm)

Question

Issue(HER with in SAC algorithm)

wadeKeith opened this issue 5 months ago · comments

🐛 Bug

I have no idea, why my code can't run. Is that about vectorize env problem? or something. Please help me to check this code.

Code example

import numpy as np
from env import UR5Env
import math
from stable_baselines3.common.env_checker import check_env
import gymnasium as gym

from stable_baselines3 import SAC, HerReplayBuffer
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize
import time
import os

from utilize import linear_schedule

seed = 429
reset_arm_poses = [math.pi, -math.pi/2, -math.pi5/9, -math.pi4/9,
math.pi/2, 0]
reset_gripper_range = [0, 0.085]
visual_sensor_params = {
'image_size': [128, 128],
'dist': 1.0,
'yaw': 90.0,
'pitch': -25.0,
'pos': [0.6, 0.0, 0.0525],
'fov': 75.0,
'near_val': 0.1,
'far_val': 5.0,
'show_vision': False
}
robot_params = {
"reset_arm_poses": reset_arm_poses,
"reset_gripper_range": reset_gripper_range,
}

sim_params = {"use_gui":False,
'timestep':1/240,
'control_type':'joint',
'gripper_enable':False}
env_kwargs_dict = {"sim_params":sim_params, "robot_params": robot_params, "visual_sensor_params": visual_sensor_params}

vec_env = make_vec_env(UR5Env, n_envs=1, env_kwargs = env_kwargs_dict, seed=seed)
vec_env = VecNormalize(vec_env, norm_obs=True, norm_reward=False)
model = SAC("MultiInputPolicy",vec_env,
learning_rate = linear_schedule(1e-6),
replay_buffer_class=HerReplayBuffer,
replay_buffer_kwargs=dict( n_sampled_goal=1, goal_selection_strategy="future"),
buffer_size = 1000000,
learning_starts = 100,
batch_size = 256,
tau = 0.005,
gamma = 0.99,
train_freq = (1, "step"), #(2, "episode"), (5, "step")
tensorboard_log = './logs',
seed = seed,
verbose=1,
device='cuda')
model.learn(total_timesteps=500000,
log_interval=10,
tb_log_name="ur5_robotiq140_sac",
progress_bar=True)
model.save("./model/ur5_robotiq140_sac")
stats_path = os.path.join('./normalize_file/', "vec_normalize_sac.pkl")
vec_env.save(stats_path)

vec_env.close()
del model ,vec_env# remove to demonstrate saving and loading
sim_params['use_gui'] = True
env_kwargs_dict = {"sim_params":sim_params, "robot_params": robot_params, "visual_sensor_params": visual_sensor_params}
vec_env = make_vec_env(UR5Env, n_envs=1, env_kwargs = env_kwargs_dict, seed=seed)
vec_env = VecNormalize.load(stats_path, vec_env)
vec_env.training = False
vec_env.norm_reward = False
model = SAC.load("./model/ur5_robotiq140_sac",env=vec_env)
obs = vec_env.reset()
dones=False
while not dones:
action, _states = model.predict(obs,deterministic=True)
obs, rewards, dones, info = vec_env.step(action)
vec_env.render("human")
vec_env.close()
exit()

Relevant log output / Error message

Unable to sample before the end of the first episode. We recommend choosing a value for learning_starts that is greater than the maximum number of timesteps in the environment.
  File "/home/zxr/Documents/yincheng/Github/UR5/train_sac.py", line 76, in <module>
    model.learn(total_timesteps=500000,
RuntimeError: Unable to sample before the end of the first episode. We recommend choosing a value for learning_starts that is greater than the maximum number of timesteps in the environment.

System Info

OS: Linux-5.15.0-101-generic-x86_64-with-glibc2.31 # 111~20.04.1-Ubuntu SMP Mon Mar 11 15:44:43 UTC 2024

Python: 3.10.14
Stable-Baselines3: 2.3.0
PyTorch: 2.2.2+cu121
GPU Enabled: True
Numpy: 1.26.4
Cloudpickle: 3.0.0
Gymnasium: 0.29.1
OpenAI Gym: 0.26.2

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
I have provided a minimal and working example to reproduce the bug
I have checked my env using the env checker
I've used the markdown code blocks for both code and stack traces.

Antonin RAFFIN · Answer 1 · Sat Apr 13 2024 13:39:00 GMT+0800 (China Standard Time)

hello,
the error message is quite explicit no?
you should increase the warmup phase (learning starts), and you should have a look at recommended parameters in the rl zoo.

尹诚 · Answer 2 · Sun Apr 14 2024 18:24:30 GMT+0800 (China Standard Time)

hello, the error message is quite explicit no? you should increase the warmup phase (learning starts), and you should have a look at recommended parameters in the rl zoo.

Thanks, the problem is solved. The train_freq need to be modified. Because the train_feq is too fast that the sample process can't work.