ml-jku / hopfield-layers

Hi, nice work!
As I read the code, I found that the xi in the code is actually the softmax output of the key-query association matrix.

hopfield-layers/hflayers/functional.py

Line 419 in f56f929

xi = nn.functional.softmax(attn_output_weights, dim=-1)

But in the paper, it says it is product of the softmax and stored patterns.

Can you explain that?

Hi @Xinpeng-Wang,

thanks for your interest in our work! The xi in

hopfield-layers/hflayers/functional.py

Line 419 in f56f929

xi = nn.functional.softmax(attn_output_weights, dim=-1)

is the p in equations (3), or (442). The xi, as described in our paper, is ultimately computed as

hopfield-layers/hflayers/functional.py

Line 439 in f56f929

attn_output = torch.bmm(attn_output_weights, v)

and termed attn_output to be in line with the official PyTorch repository v1.6.0 (see Disclaimer for more details).

yes, but in the case of multiple update, as described in the paper, a threshold is applied on the xi_new and xi_old.

And this threshold is also applied on the xi in the code.

hopfield-layers/hflayers/functional.py

Line 429 in f56f929

    
           update_active_heads &= ((xi_old - xi_active).norm(p=2, dim=(2, 3)).max(axis=0)[0]) > update_steps_eps

Yes, strictly speaking, the threshold in the implementation is directly applied on the basis of p. But if p does not change between multiple updates, xi does not change either. The naming in the implementation is a little bit misleading, as p is used as a proxy for xi in this case.

A small note on multiple updates in general: one update step is already enough, as stated in Theorem 4 of the paper.

xi in the code is different from the paper?