dqn_play.py for Chapter07
dorjeduck opened this issue · comments
Hi Maxim,
first of all, fantastic book, thank you so much for that.
I saw other posted about that before but I couldnt resolve my problems reading that issues. I am sure its an easy task but to dull here on my side it seems.
I am struggling to adopt 03_dqn_play.py from Chapter 06 to the examples of Chapter 07. I am able with some minor tweaks to save the best nets during training, but I fail trying to "play" these nets. My problems start with the different wrappers we use in Chapter 07, which result in env.reset() returning a LazyFrames object instead of an observation.
If somebody out there could manage to write a small script to run the trained nets of Chapter 07 I would highly appreciate if you can share. Of course also any pointer how I can get this done myself would be highly appreciated.
Thanks
martin
I am having the same problem. Here's my code, which probably looks a lot like yours:
import gym
import time
import ptan
import argparse
import numpy as np
import torch
from lib import wrappers
from lib import dqn_model
DEFAULT_ENV_NAME = "PongNoFrameskip-v4"
FPS = 25
if name == "main":
parser = argparse.ArgumentParser()
parser.add_argument( "-m", "--model", required = True, help = "Model file to load" )
parser.add_argument( "-e", "--env", default = DEFAULT_ENV_NAME, help = "Environment name to use, default = " + DEFAULT_ENV_NAME )
parser.add_argument( "-r", "--record", help = "Directory to store video recording" )
args = parser.parse_args()
#env = wrappers.make_env( args.env )
env = ptan.common.wrappers.wrap_dqn( gym.make( args.env ) )
if args.record:
env = gym.wrappers.Monitor( env, args.record )
net = dqn_model.DQN( env.observation_space.shape, env.action_space.n )
net.load_state_dict( torch.load( args.model ) )
state = env.reset()
total_reward = 0.0
while True:
start_ts = time.time()
env.render()
state_v = torch.tensor( np.array( [ state ], copy = False ) )
q_vals = net( state_v )
_, act_v = torch.max( q_vals, dim = 1 )
action = int( act_v.item() )
state, reward, done, _ = env.step( action )
total_reward += reward
if done:
break
delta = 1/ FPS - ( time.time() - start_ts )
if delta > 0:
time.sleep( delta )
print( "Total reward: {:.2f}".format( total_reward ) )
And here's the error:
File "03_dqn_play.py", line 33, in
state_v = torch.tensor( np.array( [ state ], copy = False ) )
TypeError: int() argument must be a string, a bytes-like object or a number, not 'LazyFrames'
Upon further digging, I found the answer. The code from 03_dqn_play.py does not change. Copy that over unchanged. What you need to change is in the wrappers.py file.
In ImageToPyTorch.observation(), change np.moveaxis to np.swapaxes.
In make_env(), remove the call that creates a ScaledFloatFrame. The network already does the scaling. Just return env after the BufferWrapper.
Found here: Shmuma/ptan#19 (comment)
Hey,
thank you so much anothercodejunkie for that replies. Unfortunately I am on the move right now and cant implement your suggestions and give you feedback but I will definitely do once things are setup and ready again.
Thanks again
Martin