DocF / CMAFF

Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery

Auhtor: FANG Qingyun and WANG Zhaokui

Intro

CMAFF:Cross-Modality Attentive Feature Fusion

Differential Enhancive Module

Common Selective Module

Cross-modality fusing complementary information of multispectral remote sensing image pairs can improve the perception ability of detection algorithms, making them more robust and reliable for a wider range of applications, such as nighttime detection. Compared with prior methods, we think different features should be processed specifically, the modality-specific features should be retained and enhanced, while the modality-shared features should be cherry- picked from the RGB and thermal IR modalities. Following this idea, a novel and lightweight multispectral feature fusion approach with joint common-modality and differential-modality attentions are proposed, named Cross-Modality Attentive Feature Fusion (CMAFF). Given the intermediate feature maps of RGB and IR images, our module parallel infers attention maps from two separate modalities, common- and differential-modality, then the attention maps are multiplied to the input feature map respectively for adaptive feature enhancement or selection. Extensive experiments demonstrate that our proposed approach can achieve the state-of-the-art performance at a low computation cost. For more details, please refer to our paper.

Citation

If you are interested this repo for your research, welcome to cite our paper:

@article{qingyun2022cross,
  title={Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery},
  author={Qingyun, Fang and Zhaokui, Wang},
  journal={Pattern Recognition},
  pages={108786},
  year={2022},
  publisher={Elsevier}
}

Result

Model Attention Params(M) FLOPs(M) MemR+W(MB)
Yolov5l MCFF_1 0.03 0.06 0.13
MCFF_2 0.06 0.13 0.26
MCFF_3 0.16 0.50 1.02
Average 0.08 0.23 0.47
Yolov5l GFU_1 2.38 30400 103.25
GFU_2 9.50 30400 84.88
GFU_3 38.00 30400 175.44
Average 16.63 30400 121.19
Yolov5l CMAFF_1 0.04 0.08 0.16
CMAFF_2 0.08 0.16 0.33
CMAFF_3 0.31 0.62 1.28
Average 0.14 0.29 0.59
Yolov5s MCFF_1 0.02 0.03 0.07
MCFF_2 0.03 0.06 0.13
MCFF_3 0.06 0.13 0.26
Average 0.04 0.07 0.15
Yolov5s GFU_1 0.59 7600 49.25
GFU_2 9.50 7600 32.94
GFU_3 38.00 7600 49.72
Average 4.16 7600 43.97
Yolov5s CMAFF_1 0.02 0.04 0.08
CMAFF_2 0.04 0.08 0.16
CMAFF_3 0.08 0.16 0.33
Average 0.05 0.09 0.19

About

Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery