The effects of left multiply and right multiply of stochastic matrix (i.e. the parameter --flip)
jiayao6 opened this issue · comments
Thank you so much for your great job and sharing the code.
In the paper Eq. 1, the affinities are normalized by a row-wise softmax. So I think in Eq.2, the multiply should be the right multiply. More specifically, the right multiply A_{t}^{t+1} * A_{t+1}^{t+2} makes sense, while the left multiply A_{t+1}^{t+2} * A_{t}^{t+1} does not make sense. According to Eq. 4, we can also know the right multiply is correct.
However, when we reproduce the experiments, the result is opposite. The left multiply (--flip True) is correct, of which the performance is above 0.67; the right multiply (--flip False, which is default) is wrong, where the performance is only around 0.2, which is below than random initialized model (around 0.4).
I cannot explain why it happens, could you help me to find the reasons?
Looking forward to your reply, thank you!
Hi @jiayao6,
Thanks for the detailed question! It is indeed interesting that the left multiply still works. I suggested a hypothesis in the readme, but essentially, I believe that since the patches do not all change position between time steps, the left multiply results in a occasional shuffling of a few patches. The effect is similar to an edge dropout effect.
I should note that right multiply still works (as it should), but you need to tune the temperature hyper-parameter to a different setting. We found that we needed a slightly smaller temperature (sharper distribution): 0.07 was too high for the right multiply, but 0.01 - 0.05 were ok.
The command provided in the README reproduces the pretrained model provided.
Best,
Allan