Problem with DummyVecEnv wrapped inside a VecNormalize wrapper with render_all() method
0xYelshayeb opened this issue · comments
The problem is something inside the DummyVecEnv
which resets the environment automatically after it is done.
Also, there was a mistake in your code. Try this:
env_maker = lambda: gym.make('forex-v0', frame_bound=(100, 5000), window_size=10)
env = DummyVecEnv([env_maker])
# Training Env
policy_kwargs = dict(net_arch=[64, 'lstm',dict(vf=[128,128,128], pi=[64,64])])
model = A2C("MlpLstmPolicy", env, verbose=1, policy_kwargs=policy_kwargs)
model.learn(total_timesteps=1000)
# Testing Env
env = env_maker()
observation = env.reset()
while True:
observation = observation[np.newaxis, ...]
# action = env.action_space.sample()
action, _states = model.predict(observation)
observation, reward, done, info = env.step(action)
# env.render()
if done:
print("info:", info)
break
# Plotting results
plt.cla()
env.render_all()
plt.show()
Originally posted by @AminHP in #1 (comment)
I saw this reply on a similar problem I had with the render_all() method. Though in my case I am using a VecNormalize() wrapper around my DummyVecEnv. In the solution quoted a DummyVecEnv was made that was used for training, and then another env was instantiated for the prediction/testing that could be used with render all. In my case this won't work since I need the VecNormalize to normalize observations and reward.
env = make_vec_env(env_maker, n_envs=1, monitor_dir=log_dir)
env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10.)
model = PPO2('MlpLstmPolicy', env, verbose=1, nminibatches=1, policy_kwargs=policy_kwargs,)
callback = SaveOnBestTrainingRewardCallback(check_freq=1000, log_dir=log_dir, env=env, verbose=1)
# model = PPO2('MlpLstmPolicy', env, verbose=1)
model.learn(total_timesteps=5000, callback=callback, log_interval=10)
env.norm_reward = False
env.training = False
mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=1)
print(f"mean_reward:{mean_reward:.2f} +/- {std_reward:.2f}")
# I get the expected reward here using evaluate_policy()
plt.figure(figsize=(15,6))
plt.cla()
env.render_all()
plt.show()
# This part doesn't work because of the same error
What can I do to use render_all() method (Or any other attribute like env.history for that matter) while maintaining the VecNormalize() environment?
Use this code to access the inner env and render it:
env.venv.envs[0].env.render_all()