can i use MonarchMixer replace cross attention lay

Question

can i use MonarchMixer replace cross attention lay

autumn-2-net opened this issue 9 months ago · comments

The Sequence Mixer in the paper doesn't seem to be able to mix unequal lengths of sequences in the same way as corss attention.because it uses elementwise multiplication.Is this a misunderstanding on my part or is Monarch Mixer not a replacement for cross attention?

Dan Fu · Answer 1 · Sat Nov 04 2023 11:51:08 GMT+0800 (China Standard Time)

This is something we're very interested in and still working on! We don't have a formula for it quite yet.

autumn-2-net · Answer 2 · Sat Nov 04 2023 12:11:56 GMT+0800 (China Standard Time)

This is something we're very interested in and still working on! We don't have a formula for it quite yet.

This doesn't sound like good news, it looks like I'll just have to CROSS ATTENTION mix MonarchMixer, is there a performance loss compared to raw ATTENTION?

Dan Fu · Answer 3 · Sat Nov 04 2023 13:14:13 GMT+0800 (China Standard Time)

We've seen that we can match self-attention in quality with some gated convolutions (see the paper for details). Cross attention is still an open problem - which we'll be working on!

autumn-2-net · Answer 4 · Sun Dec 03 2023 16:38:09 GMT+0800 (China Standard Time)

We've seen that we can match self-attention in quality with some gated convolutions (see the paper for details). Cross attention is still an open problem - which we'll be working on!

If I use M2 can I not use positional coding as I feel that M2 looks a bit similar to conv which allows the model to know the positional information