google-deepmind / dnc

A TensorFlow implementation of the Differentiable Neural Computer.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Confusion about the memory module

alirezazareian opened this issue · comments

I have a very basic question and I will be grateful if someone could help.

From the code it seems that in every execution of _, loss = sess.run([train_step, train_loss]), the content of the memory, i.e., the initial state of DNC, will be re-initialized by zero_state. This means each instance of the RepeatCopy task is processed using an empty memory. However, from my understanding of the paper and memory networks in general, I believe the content of the memory should be incrementally updated given every instance of the task, and this collected knowledge should be used at test time.

If my understanding of the code is correct, it means the only thing that is incrementally trained is the controller, and it only learns how to use an empty memory to solve a task. Then in test time, it uses the memory as a temporary place to write intermediate processing results before taking action. On the other hand, if my understanding of the paper is correct, the controller and the memory content will both be incrementally trained. During test time, the controller matches the given task with what it already has in memory and decides what to do.

Could someone please clarify here which one is correct? If the memory content is actually incrementally collected during training, please point me to a part of code that I can see how it is done, because the training loop does not show any cue that it preserves the memory state during training steps.

I have seen in other instances of memory networks that the memory content is actually a TF Variable, and is preserved and incrementally updated during training, for example, refer to https://github.com/tensorflow/models/tree/master/learning_to_remember_rare_events.

Thank you very much for your help, in advance.

For the dnc tasks the external memory (and lstm cell memory) is wiped between episodes. This is because the tasks do not require memory to be preserved between episodes. E.g. for copying random binary strings, once the episode has concluded we do not want to keep around the state of memory for the next episode because it will negatively interfere with the next binary string to be copied. We want the controller's weights to learn the algorithmic solution over many examples and the memory simply serves as a scratch pad to store intermediate information.

In tasks where memories may be persistent over episodes, such as language modeling or translation, one can simply pass the state output between session run calls to preserve memory contents. In this context one might also want to optimize the contents with gradient descent with tricks like Kaiser et al., in which case making the memory a variable makes sense.

Thank you very much for your informative response. It makes a lot of sense.