xi in the code is different from the paper?
Xinpeng-Wang opened this issue · comments
Hi, nice work!
As I read the code, I found that the xi in the code is actually the softmax output of the key-query association matrix.
hopfield-layers/hflayers/functional.py
Line 419 in f56f929
But in the paper, it says it is product of the softmax and stored patterns.
Can you explain that?
Hi @Xinpeng-Wang,
thanks for your interest in our work! The xi
in
hopfield-layers/hflayers/functional.py
Line 419 in f56f929
is the
p
in equations (3), or (442). The xi
, as described in our paper, is ultimately computed ashopfield-layers/hflayers/functional.py
Line 439 in f56f929
and termed
attn_output
to be in line with the official PyTorch repository v1.6.0 (see Disclaimer for more details).yes, but in the case of multiple update, as described in the paper, a threshold is applied on the xi_new and xi_old.
And this threshold is also applied on the xi in the code.
hopfield-layers/hflayers/functional.py
Line 429 in f56f929
Yes, strictly speaking, the threshold in the implementation is directly applied on the basis of p
. But if p
does not change between multiple updates, xi
does not change either. The naming in the implementation is a little bit misleading, as p
is used as a proxy for xi
in this case.
A small note on multiple updates in general: one update step is already enough, as stated in Theorem 4 of the paper.