AminHP / gym-anytrading

The most simple, flexible, and comprehensive OpenAI Gym trading environment (Approved by OpenAI Gym)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem with DummyVecEnv wrapped inside a VecNormalize wrapper with render_all() method

0xYelshayeb opened this issue · comments

The problem is something inside the DummyVecEnv which resets the environment automatically after it is done.

Also, there was a mistake in your code. Try this:

env_maker = lambda: gym.make('forex-v0', frame_bound=(100, 5000), window_size=10)
env = DummyVecEnv([env_maker])

# Training Env
policy_kwargs = dict(net_arch=[64, 'lstm',dict(vf=[128,128,128], pi=[64,64])])
model = A2C("MlpLstmPolicy", env, verbose=1, policy_kwargs=policy_kwargs)
model.learn(total_timesteps=1000)

# Testing Env 
env = env_maker()
observation = env.reset()

while True:
    observation = observation[np.newaxis, ...]
    # action = env.action_space.sample()
    action, _states = model.predict(observation)
    observation, reward, done, info = env.step(action)
    # env.render()
    if done:
        print("info:", info)
        break

# Plotting results
plt.cla()
env.render_all()
plt.show()

Originally posted by @AminHP in #1 (comment)

I saw this reply on a similar problem I had with the render_all() method. Though in my case I am using a VecNormalize() wrapper around my DummyVecEnv. In the solution quoted a DummyVecEnv was made that was used for training, and then another env was instantiated for the prediction/testing that could be used with render all. In my case this won't work since I need the VecNormalize to normalize observations and reward.

env = make_vec_env(env_maker, n_envs=1, monitor_dir=log_dir)
env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10.)

model = PPO2('MlpLstmPolicy', env, verbose=1, nminibatches=1, policy_kwargs=policy_kwargs,)
callback = SaveOnBestTrainingRewardCallback(check_freq=1000, log_dir=log_dir, env=env, verbose=1)
# model = PPO2('MlpLstmPolicy', env, verbose=1)

model.learn(total_timesteps=5000, callback=callback, log_interval=10)

env.norm_reward = False
env.training = False

mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=1)
print(f"mean_reward:{mean_reward:.2f} +/- {std_reward:.2f}")

# I get the expected reward here using evaluate_policy()

plt.figure(figsize=(15,6))
plt.cla()
env.render_all()
plt.show()

# This part doesn't work because of the same error

What can I do to use render_all() method (Or any other attribute like env.history for that matter) while maintaining the VecNormalize() environment?

Use this code to access the inner env and render it:
env.venv.envs[0].env.render_all()