Flowformer_NLP/flow_attention.py 在使用交叉注意力方式计算，且q 和kv的长度不同时会报错

Question

Flowformer_NLP/flow_attention.py 在使用交叉注意力方式计算，且q 和kv的长度不同时会报错

wanpengxyzz opened this issue 2 years ago · comments

(1) incoming and outgoing flow

        sink_incoming = 1.0 / (torch.einsum("nld,nld->nl", q + 1e-6, k.cumsum(dim=1) + 1e-6))

wanpengxyzz commented 2 years ago

a

wuhaixu2016 commented 2 years ago

是的

wuhaixu2016 · Answer 1 · Sun Jul 31 2022 20:56:33 GMT+0800 (China Standard Time)

您好，感谢关注
（1）我们在Flowformer_NLP提供的Flow-Attention是针对language modeling这个causal任务的，即QKV长度一致的情况
（2）如果您使用交叉注意力的话，则对应非causal版本，可以在这里找到https://github.com/thuml/Flowformer/blob/main/Flow_Attention.py

wanpengxyzz · Answer 2 · Mon Aug 01 2022 14:23:49 GMT+0800 (China Standard Time)

谢谢，另外请教下，https://github.com/thuml/Flowformer/blob/main/Flow_Attention.py的版本中没有attention_mask的设置，看代码是默认用了cumsum，也就是说对应的就是一个上三角mask矩阵？

wanpengxyzz · Answer 3 · Mon Aug 01 2022 20:08:20 GMT+0800 (China Standard Time)

好的，谢谢解答