PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Chapter6 DQN Pong can't calculate loss

pmsdOliveira opened this issue · comments

I'm running the exact same code yet i get this error:

C:\Users\Utilizador\anaconda3\python.exe "C:/Users/Utilizador/Thesis/Deep Reinforcement Learning Hands-On/Chapter6/02_dqn_pong.py"
DQN(
(conv): Sequential(
(0): Conv2d(4, 32, kernel_size=(8, 8), stride=(4, 4))
(1): ReLU()
(2): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2))
(3): ReLU()
(4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
(5): ReLU()
)
(fc): Sequential(
(0): Linear(in_features=3136, out_features=512, bias=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=6, bias=True)
)
)
880: done 1 games, mean reward -21.000, eps 0.99, speed 720.31 f/s
1848: done 2 games, mean reward -21.000, eps 0.98, speed 659.82 f/s
2670: done 3 games, mean reward -21.000, eps 0.97, speed 642.90 f/s
3492: done 4 games, mean reward -21.000, eps 0.97, speed 666.83 f/s
4659: done 5 games, mean reward -20.600, eps 0.95, speed 671.71 f/s
Best mean reward updated -21.000 -> -20.600, model saved
5765: done 6 games, mean reward -20.167, eps 0.94, speed 629.02 f/s
Best mean reward updated -20.600 -> -20.167, model saved
6682: done 7 games, mean reward -20.143, eps 0.93, speed 645.23 f/s
Best mean reward updated -20.167 -> -20.143, model saved
7668: done 8 games, mean reward -20.125, eps 0.92, speed 628.90 f/s
Best mean reward updated -20.143 -> -20.125, model saved
8648: done 9 games, mean reward -20.000, eps 0.91, speed 606.56 f/s
Best mean reward updated -20.125 -> -20.000, model saved
9919: done 10 games, mean reward -19.700, eps 0.90, speed 623.48 f/s
Best mean reward updated -20.000 -> -19.700, model saved
Traceback (most recent call last):
File "C:/Users/Utilizador/Thesis/Deep Reinforcement Learning Hands-On/Chapter6/02_dqn_pong.py", line 169, in
loss_t = calc_loss(batch, net, tgt_net, device=device)
File "C:/Users/Utilizador/Thesis/Deep Reinforcement Learning Hands-On/Chapter6/02_dqn_pong.py", line 96, in calc_loss
state_action_values = net(states_v).gather(1, actions_v.unsqueeze(-1)).squeeze(-1)
RuntimeError: gather_out_cpu(): Expected dtype int64 for index

Process finished with exit code 1

I'm relatively new to Python and couldn't find any solution to this kind of problem on other sources. Anyone had this same problem or knows how to fix it?

I just found that if you're on newer versions of PyTorch you should define your calc_loss like this:

`def calc_loss(batch, net, tgt_net, device="cpu"):
states, actions, rewards, dones, next_states = batch

states_v = torch.tensor(states).to(device)
next_states_v = torch.tensor(next_states).to(device)
actions_v = torch.tensor(actions).to(device, dtype=torch.int64)
rewards_v = torch.tensor(rewards).to(device)
done_mask = torch.tensor(dones).to(device, dtype=torch.bool)

state_action_values = net(states_v).gather(1, actions_v.unsqueeze(-1)).squeeze(-1)
next_state_values = tgt_net(next_states_v).max(1)[0]
next_state_values[done_mask] = 0.0
next_state_values = next_state_values.detach()

expected_state_action_values = next_state_values * GAMMA + rewards_v
return nn.MSELoss()(state_action_values, expected_state_action_values)`

The only differences between this and the code in the book are the declarations of vars actions_v and done_mask