第7章-DQN算法训练时报出错误 ValueError: expected sequence of length 4 at dim 2 (got 0)

Question

第7章-DQN算法训练时报出错误 ValueError: expected sequence of length 4 at dim 2 (got 0)

horacehht opened this issue 2 years ago · comments

在运行第5个代码块时，报出错误ValueError: expected sequence of length 4 at dim 2 (got 0)，完整提示如下：
`ValueError Traceback (most recent call last)
f:\Codefield\jupyter_workspace\deeplearn\rl\Hands-on-RL\第7章-DQN算法.ipynb 单元格 5 in <cell line: 26>()
31 done = False
32 while not done:
---> 33 action = agent.take_action(state) # 根据状态state作出动作action
34 next_state, reward, done, _ = env.step(action) # 实际探索
35 replay_buffer.add(state, action, reward, next_state, done) # 将(状态，动作，奖励，下个状态，done)放入缓冲池

f:\Codefield\jupyter_workspace\deeplearn\rl\Hands-on-RL\第7章-DQN算法.ipynb 单元格 5 in DQN.take_action(self, state)
22 action = np.random.randint(self.action_dim)
23 else:# 利用
---> 24 state = torch.tensor([state], dtype=torch.float).to(self.device)
25 action = self.q_net(state).argmax().item() # 选取分数最大的那个动作，然后item函数转为python内置的数字类型
26 return action

ValueError: expected sequence of length 4 at dim 2 (got 0)`
我只是添加了注释，并没有修改任何代码。按道理来说，作为一本教科书，除非是版本问题导致的，否则不应该出现这种错误

ialwayshungry commented a year ago

thanks!!!

Haitao Huang · Answer 1 · Sat Feb 11 2023 21:23:05 GMT+0800 (China Standard Time)

本人使用的gym版本是0.26.2

Haitao Huang · Answer 2 · Sat Feb 11 2023 21:47:37 GMT+0800 (China Standard Time)

经过查阅官方文档及debug，确认是gym版本变更导致的错误。
原因：gym中的env对象的reset方法，step方法的返回值作了改动。
解决方案：
将第5个代码块中的state = env.reset()修改为state = env.reset()[0]，同时将next_state, reward, done, _ = env.step(action)修改为next_state, reward, done, _, __ = env.step(action)。
修改后即可进行训练。

解决问题的过程：
检查时发现state = env.reset()处，state的类型为tuple，第一个元素为array，第二个为dict，如(array([-0.0344371 , -0.01493822, -0.01339062, -0.02076969], dtype=float32), {})
因此将state = env.reset()修改为state = env.reset()[0]即可解决上述问题（输入格式）。

但是这样修改之后，会出现新的错误，ValueError: too many values to unpack (expected 4)，完整提示如下：

`ValueError                                Traceback (most recent call last)
f:\Codefield\jupyter_workspace\deeplearn\rl\Hands-on-RL\第7章-DQN算法.ipynb 单元格 6 in <cell line: 26>()
     32 while not done:
     33     action = agent.take_action(state)  # 根据状态state作出动作action
---> 34     next_state, reward, done, _ = env.step(action)  # 实际探索
     35     replay_buffer.add(state, action, reward, next_state, done)  # 将(状态，动作，奖励，下个状态，done)放入缓冲池
     36     state = next_state   

ValueError: too many values to unpack (expected 4)`

经过检查env.step(action)输出的东西，发现输出了5个，而=号左边是4个，因此导致了错误。所以，将next_state, reward, done, _ = env.step(action)修改为next_state, reward, done, _, __ = env.step(action)。即可解决此问题

kongshengqi · Answer 3 · Sat Apr 08 2023 15:52:00 GMT+0800 (China Standard Time)

很棒！也解决了我的问题！

Xinlei_Zhou · Answer 4 · Sun Apr 23 2023 23:17:26 GMT+0800 (China Standard Time)

感谢，帮助很大

Yantao Zhang · Answer 5 · Thu May 18 2023 14:44:04 GMT+0800 (China Standard Time)

在运行第5个代码块时，报出错误ValueError: expected sequence of length 4 at dim 2 (got 0)，完整提示如下： `ValueError Traceback (most recent call last) f:\Codefield\jupyter_workspace\deeplearn\rl\Hands-on-RL\第7章-DQN算法.ipynb 单元格 5 in <cell line: 26>() 31 done = False 32 while not done: ---> 33 action = agent.take_action(state) # 根据状态state作出动作action 34 next_state, reward, done, _ = env.step(action) # 实际探索 35 replay_buffer.add(state, action, reward, next_state, done) # 将(状态，动作，奖励，下个状态，done)放入缓冲池

f:\Codefield\jupyter_workspace\deeplearn\rl\Hands-on-RL\第7章-DQN算法.ipynb 单元格 5 in DQN.take_action(self, state) 22 action = np.random.randint(self.action_dim) 23 else:# 利用 ---> 24 state = torch.tensor([state], dtype=torch.float).to(self.device) 25 action = self.q_net(state).argmax().item() # 选取分数最大的那个动作，然后item函数转为python内置的数字类型 26 return action

ValueError: expected sequence of length 4 at dim 2 (got 0)` 我只是添加了注释，并没有修改任何代码。按道理来说，作为一本教科书，除非是版本问题导致的，否则不应该出现这种错误

把 state = env.reset() 改为 state, info = env.reset() 可以解决这个报错。

jennie1124 · Answer 6 · Wed May 24 2023 14:47:26 GMT+0800 (China Standard Time)

感谢，很用帮助！

XianXuehui · Answer 7 · Thu Sep 07 2023 11:32:30 GMT+0800 (China Standard Time)

@horacehht 感谢！！

第7章-DQN算法 训练时报出错误 ValueError: expected sequence of length 4 at dim 2 (got 0)

第7章-DQN算法训练时报出错误 ValueError: expected sequence of length 4 at dim 2 (got 0)