PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dqn_play.py for Chapter07

dorjeduck opened this issue · comments

Hi Maxim,

first of all, fantastic book, thank you so much for that.

I saw other posted about that before but I couldnt resolve my problems reading that issues. I am sure its an easy task but to dull here on my side it seems.

I am struggling to adopt 03_dqn_play.py from Chapter 06 to the examples of Chapter 07. I am able with some minor tweaks to save the best nets during training, but I fail trying to "play" these nets. My problems start with the different wrappers we use in Chapter 07, which result in env.reset() returning a LazyFrames object instead of an observation.

If somebody out there could manage to write a small script to run the trained nets of Chapter 07 I would highly appreciate if you can share. Of course also any pointer how I can get this done myself would be highly appreciated.

Thanks

martin

I am having the same problem. Here's my code, which probably looks a lot like yours:

import gym
import time
import ptan
import argparse
import numpy as np
import torch
from lib import wrappers
from lib import dqn_model

DEFAULT_ENV_NAME = "PongNoFrameskip-v4"
FPS = 25

if name == "main":
parser = argparse.ArgumentParser()
parser.add_argument( "-m", "--model", required = True, help = "Model file to load" )
parser.add_argument( "-e", "--env", default = DEFAULT_ENV_NAME, help = "Environment name to use, default = " + DEFAULT_ENV_NAME )
parser.add_argument( "-r", "--record", help = "Directory to store video recording" )
args = parser.parse_args()

#env = wrappers.make_env( args.env )
env = ptan.common.wrappers.wrap_dqn( gym.make( args.env ) )

if args.record:
	env = gym.wrappers.Monitor( env, args.record )
net = dqn_model.DQN( env.observation_space.shape, env.action_space.n )
net.load_state_dict( torch.load( args.model ) )

state = env.reset()
total_reward = 0.0
while True:
	start_ts = time.time()
	env.render()
	state_v = torch.tensor( np.array( [ state ], copy = False ) )
	q_vals = net( state_v )
	_, act_v = torch.max( q_vals, dim = 1 )
	action = int( act_v.item() )
	
	state, reward, done, _ = env.step( action )
	total_reward += reward
	if done:
		break
	delta = 1/ FPS - ( time.time() - start_ts )
	if delta > 0:
		time.sleep( delta )
print( "Total reward: {:.2f}".format( total_reward ) )

And here's the error:

File "03_dqn_play.py", line 33, in
state_v = torch.tensor( np.array( [ state ], copy = False ) )
TypeError: int() argument must be a string, a bytes-like object or a number, not 'LazyFrames'

Upon further digging, I found the answer. The code from 03_dqn_play.py does not change. Copy that over unchanged. What you need to change is in the wrappers.py file.

In ImageToPyTorch.observation(), change np.moveaxis to np.swapaxes.

In make_env(), remove the call that creates a ScaledFloatFrame. The network already does the scaling. Just return env after the BufferWrapper.

Found here: Shmuma/ptan#19 (comment)

Hey,

thank you so much anothercodejunkie for that replies. Unfortunately I am on the move right now and cant implement your suggestions and give you feedback but I will definitely do once things are setup and ready again.

Thanks again

Martin