snowkylin / ntm

TensorFlow implementation of Neural Turing Machines (NTM), with its application on one-shot learning (MANN)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Initialization regarding the addressing and memory matrix .

shamanez opened this issue · comments

From this part onward you have initialize the Memory matrix , previous read vector , weight list as variables. So these variables also can get updated with the optimization. Isn't that a problem? because these things should be dynamic.

Since the NTM paper do not give detail about this, I just follow my own thought here. I think it is not a big problem whether to initialize them as variables or constants, but the former might have better performance since we can try to find a better initialization via training. You can try to initialize them as constants so that the training might be faster.

As in the paper previous memory , weight set , Memory matrix are not parameters right? We are updating them with an addressing mechanism which is depend on controller and other fully connected parameters. So after the training we have an updated memory matrix , read vec , weight set etc. Then in the testing case it uses it and do the copy task. Am I right ?

Any way with your explanation I think what you did is correct. Because in the inference when we first use to the sequential task memory matrix and other things are just an initialized vectors. So it's good to tune everything as you said.
Because it will reduce the test time variance also.
You thoughts ?

The memory matrix and read vectors, etc. is not parameters or variables. They are more similar to the hidden state in RNN cell, whose value will dynamically change through the loop of RNN. After the training we will not have specific optimized values of memory matrix, read vectors, etc. When we start the inference, they will be initialized to constants using zero_state(), just like what you will do to the hidden state of RNN cell.

So as you said talking variables to initialize the memory matrix tends to have a better initialization at the end of the training which we can use as the zero state at the beginning of testing or inference.