第9章-策略梯度算法中的交叉熵损失体现在代码哪里？

Question

chensisi0730 opened this issue a year ago · comments

第9章-策略梯度算法中的交叉熵损失体现在代码哪里？
是log_prob = torch.log(self.policy_net(state).gather(1, action)) 这里吗？

第9章-策略梯度算法 中的交叉熵损失体现在代码哪里 ？