ml-jku / hopfield-layers

Hopfield Networks is All You Need

Home Page:https://ml-jku.github.io/hopfield-layers/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

xi in the code is different from the paper?

Xinpeng-Wang opened this issue · comments

Hi, nice work!
As I read the code, I found that the xi in the code is actually the softmax output of the key-query association matrix.

xi = nn.functional.softmax(attn_output_weights, dim=-1)

But in the paper, it says it is product of the softmax and stored patterns.
image

Can you explain that?

Hi @Xinpeng-Wang,

thanks for your interest in our work! The xi in

xi = nn.functional.softmax(attn_output_weights, dim=-1)

is the p in equations (3), or (442). The xi, as described in our paper, is ultimately computed as
attn_output = torch.bmm(attn_output_weights, v)

and termed attn_output to be in line with the official PyTorch repository v1.6.0 (see Disclaimer for more details).

yes, but in the case of multiple update, as described in the paper, a threshold is applied on the xi_new and xi_old.
image
And this threshold is also applied on the xi in the code.

update_active_heads &= ((xi_old - xi_active).norm(p=2, dim=(2, 3)).max(axis=0)[0]) > update_steps_eps

Yes, strictly speaking, the threshold in the implementation is directly applied on the basis of p. But if p does not change between multiple updates, xi does not change either. The naming in the implementation is a little bit misleading, as p is used as a proxy for xi in this case.

A small note on multiple updates in general: one update step is already enough, as stated in Theorem 4 of the paper.