Linear attention
shaochenze opened this issue · comments
Hi Franz, I still feel that the method proposed can be classified as a variant of linear attention. To elaborate, the softmax-log function can be simplified to a form of linear normalization:
Assuming that the output of Equation 1 is 'V', we can then express it as:
which is the equation 4 in [1] with φ=exp.
Thank you for posting your comment here! That way everyone can benefit from our discussion.
the softmax-log function can be simplified to a form of linear normalization
Ah, I see what you mean. Yes, that's correct! Applying the composition
So yes, you're right: eq. (1) on my preprint is expressible as a variant of linear attention. I will update the FAQs shortly to reflect as much. Thank you again!
PS. @shaochenze , I added a link to your comment in the README. Thank you again!