The effects of left multiply and right multiply of stochastic matrix (i.e. the parameter --flip)

Question

The effects of left multiply and right multiply of stochastic matrix (i.e. the parameter --flip)

jiayao6 opened this issue 3 years ago · comments

Thank you so much for your great job and sharing the code.

In the paper Eq. 1, the affinities are normalized by a row-wise softmax. So I think in Eq.2, the multiply should be the right multiply. More specifically, the right multiply A_{t}^{t+1} * A_{t+1}^{t+2} makes sense, while the left multiply A_{t+1}^{t+2} * A_{t}^{t+1} does not make sense. According to Eq. 4, we can also know the right multiply is correct.

However, when we reproduce the experiments, the result is opposite. The left multiply (--flip True) is correct, of which the performance is above 0.67; the right multiply (--flip False, which is default) is wrong, where the performance is only around 0.2, which is below than random initialized model (around 0.4).

I cannot explain why it happens, could you help me to find the reasons?

Looking forward to your reply, thank you!

A. Jabri · Answer 1 · Sat Feb 06 2021 06:56:19 GMT+0800 (China Standard Time)

Hi @jiayao6,

Thanks for the detailed question! It is indeed interesting that the left multiply still works. I suggested a hypothesis in the readme, but essentially, I believe that since the patches do not all change position between time steps, the left multiply results in a occasional shuffling of a few patches. The effect is similar to an edge dropout effect.

I should note that right multiply still works (as it should), but you need to tune the temperature hyper-parameter to a different setting. We found that we needed a slightly smaller temperature (sharper distribution): 0.07 was too high for the right multiply, but 0.01 - 0.05 were ok.

The command provided in the README reproduces the pretrained model provided.

Best,
Allan