Layer Attention instead of Channel Attention?

Question

Layer Attention instead of Channel Attention?

iumyx2612 opened this issue 2 years ago · comments

Why did you choose Layer Attention instead of normal Channel Attention?
Task-interactive features are concatenated after N consecutive Conv layers, then using Channel Attention could further separate each channels to specific task, instead of Layer Attention, which also conduct separation on channel dim, but can only separate in group of 6?

Chengjian Feng · Answer 1 · Wed Oct 12 2022 10:16:48 GMT+0800 (China Standard Time)

The tasks of object classification and localization have different targets, and thus focus on different types of features (e.g. different levels or receptive fields). The N interactive layers in T-head have different effective receptive fields, which allow them to capture multiple levels of semantics. The layer attention is designed to make full use of this rich information by computing more meaningful features from those layers.

iumyx2612 · Answer 2 · Wed Oct 12 2022 11:37:09 GMT+0800 (China Standard Time)

The tasks of object classification and localization have different targets, and thus focus on different types of features (e.g. different levels or receptive fields). The N interactive layers in T-head have different effective receptive fields, which allow them to capture multiple levels of semantics. The layer attention is designed to make full use of this rich information by computing more meaningful features from those layers.

I understand the intuition of Layer Attention. However, Layer Attention is just a special type of Channel Attention if we conduct Channel Attention on the concatenated feature maps from N interactive layers. And Channel Attention can also capture multi-level semantic features no? 🤔

iumyx2612 · Answer 3 · Thu Nov 03 2022 15:58:31 GMT+0800 (China Standard Time)

@fcjian hello can we discuss about this when you have free time?

Chengjian Feng · Answer 4 · Mon Nov 14 2022 10:21:03 GMT+0800 (China Standard Time)

@iumyx2612 The typical channel-wise attention is applied on a single layer. We hold the view that conducting channel-wise attention on multi-layers can be seen as the combination of the typical channel-wise attention and layer attention. So It can also capture multi-level semantic features but requires more parameters and FLOPs.

iumyx2612 · Answer 5 · Mon Nov 14 2022 22:32:45 GMT+0800 (China Standard Time)

@iumyx2612 The typical channel-wise attention is applied on a single layer. We hold the view that conducting channel-wise attention on multi-layers can be seen as the combination of the typical channel-wise attention and layer attention. So It can also capture multi-level semantic features but requires more parameters and FLOPs.

Thank you very much, understood