awjuliani / DeepRL-Agents

A set of Deep Reinforcement Learning Agents implemented in Tensorflow.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Are you sure Deep recurrent notebook is correct?

Joshuaalbert opened this issue · comments

In the notebook I don't see where your recurrent Q value model gets its trace dimension. You're just reshaping the output of a convnet and feeding this directly into an LSTM. Furthermore, should you not also provide the non-zero initial state determined at play time? I.e. the internal state should be stored in the experience buffer and used during training. Corrent me if I'm wrong please.

Yap i thought in similar way but turned that the code seems work correctly.

  1. Reshaping issue
    Here batch_size, trace_length are set to 4,8. Each Qnetwork object(main, target) receives batchtrace=32 frames. After conv4, dimension are turned into (32, 1, 1, 512) = (batchtrace, w, h, hidden units).
  2. Non-zero H0 is iteratively updated and given to feed_dict[network.state]. This state is 'last hidden state' returned by each LSTM forward passing.

I had another thought. Isn't it unnecessary to have a target network for this notebook in the first place? Since you are setting the target network to be equal to the mainDQN right before training?