XLechter / SDT

Hello, thanks for opening such an excellent job. When I read your code, I found that the code does not seem to use Selective Attention Mechanism. Instead, it uses the Cross-Attention. Do I understand mistakes?

Hi, @CRISZJ
Thanks for your interets for our work. The released pre-trained model use the original self-attention layer which means the selective ratio=1.0 to get the best performance. You can replace the attention layer in the model.py with the selective attention layer, which we give the implementation in L288 in model.py(

SDT/models/model.py

Line 288 in b8fe7ed

class Selective_SA_Layer(nn.Module):

), but it will reduce the performance according to the selective ratio you choose.

Hi, @CRISZJ Thanks for your interets for our work. The released pre-trained model use the original self-attention layer which means the selective ratio=1.0 to get the best performance. You can replace the attention layer in the model.py with the selective attention layer, which we give the implementation in L288 in model.py(

SDT/models/model.py

Line 288 in b8fe7ed

class Selective_SA_Layer(nn.Module):

), but it will reduce the performance according to the selective ratio you choose.

Okay, thanks for your reply. I have another question after reading your paper. In Section4.2. 'Adding a position encoding layer can significantly boost the performance for finding long-range relations'. Could you please tell me the performance gap between PE and no PE? :-)

@CRISZJ It seems I didn't give an ablation study about the PE in the paper. It's about 0.2 CD if i remember correctly which may not that 'significantly' 🤣

@CRISZJ It seems I didn't give an ablation study about the PE in the paper. It's about 0.2 CD if i remember correctly which may not that 'significantly' 🤣

OK, thanks for your reply again. Got it.

About Selective Attention mechanism.