Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery

Auhtor: FANG Qingyun and WANG Zhaokui

Intro

CMAFF:Cross-Modality Attentive Feature Fusion

Differential Enhancive Module

Common Selective Module

Cross-modality fusing complementary information of multispectral remote sensing image pairs can improve the perception ability of detection algorithms, making them more robust and reliable for a wider range of applications, such as nighttime detection. Compared with prior methods, we think different features should be processed specifically, the modality-specific features should be retained and enhanced, while the modality-shared features should be cherry- picked from the RGB and thermal IR modalities. Following this idea, a novel and lightweight multispectral feature fusion approach with joint common-modality and differential-modality attentions are proposed, named Cross-Modality Attentive Feature Fusion (CMAFF). Given the intermediate feature maps of RGB and IR images, our module parallel infers attention maps from two separate modalities, common- and differential-modality, then the attention maps are multiplied to the input feature map respectively for adaptive feature enhancement or selection. Extensive experiments demonstrate that our proposed approach can achieve the state-of-the-art performance at a low computation cost. For more details, please refer to our paper.

Citation

If you are interested this repo for your research, welcome to cite our paper:

@article{qingyun2022cross,
  title={Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery},
  author={Qingyun, Fang and Zhaokui, Wang},
  journal={Pattern Recognition},
  pages={108786},
  year={2022},
  publisher={Elsevier}
}

Result

Model	Attention	Params(M)	FLOPs(M)	MemR+W(MB)
Yolov5l	MCFF_1	0.03	0.06	0.13
	MCFF_2	0.06	0.13	0.26
	MCFF_3	0.16	0.50	1.02
	Average	0.08	0.23	0.47
Yolov5l	GFU_1	2.38	30400	103.25
	GFU_2	9.50	30400	84.88
	GFU_3	38.00	30400	175.44
	Average	16.63	30400	121.19
Yolov5l	CMAFF_1	0.04	0.08	0.16
	CMAFF_2	0.08	0.16	0.33
	CMAFF_3	0.31	0.62	1.28
	Average	0.14	0.29	0.59
Yolov5s	MCFF_1	0.02	0.03	0.07
	MCFF_2	0.03	0.06	0.13
	MCFF_3	0.06	0.13	0.26
	Average	0.04	0.07	0.15
Yolov5s	GFU_1	0.59	7600	49.25
	GFU_2	9.50	7600	32.94
	GFU_3	38.00	7600	49.72
	Average	4.16	7600	43.97
Yolov5s	CMAFF_1	0.02	0.04	0.08
	CMAFF_2	0.04	0.08	0.16
	CMAFF_3	0.08	0.16	0.33
	Average	0.05	0.09	0.19

About

Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery