ykshi/VehicleMAE

Official PyTorch implementation of Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception, Xiao Wang, Wentao Wu, Chenglong Li, Zhicheng Zhao, Zhe Chen, Yukai Shi, Jin Tang, AAAI-2024 [arXiv]

Abstract

Our Proposed Framework VehicleMAE

Environment Setting

Dataset Download

Pre-trained Model Download

Pre-trained Model	Vit-base
Pre-trained checkpoint	download
Extracted code	6zkx

Training

#If you pre-training VehicleMAE using a single GPU, please run.
CUDA_VISIBLE_DEVICES=0 python main.py
#If you pre-training VehicleMAE using multiple GPUs, please run.
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 main.py

Experimental Results

We used full fine-tuning to test the pre-trained model on four downstream tasks. The results are shown in the table below.

Method	Dataset	VAR			V-Reid		VFR	VPS
Method	Dataset	mA	Acc	F1	mAP	R1	Acc	mIou	mAcc
Scratch	-	84.67	80.86	84.90	35.3	57.3	24.8	49.36	59.22
MoCov3	Imagenet1K	90.38	93.88	95.33	75.5	94.4	91.3	73.17	78.60
DINO	Imagenet1K	89.92	91.09	93.11	64.3	91.5	-	68.43	73.37
IBOT	Imagenet1K	89.51	90.17	92.37	68.9	92.6	81.1	66.03	71.06
MAE	Imagenet1K	89.69	93.60	95.08	76.7	95.8	91.2	69.54	75.36
MAE	Autobot1M	90.19	94.06	95.43	75.5	95.4	91.3	69.00	75.36
VehicleMAE	Autobot1M	92.21	94.91	96.17	85.6	97.9	94.5	73.29	80.22

The four downstream tasks are vehicle attribute recognition (VAR), vehicle re-identification (V-Reid), vehicle fine-grained recognition (VFR), and vehicle partial segmentation (VPS).

Visual Results

Acknowledgement

Citation

If you find this work helps your research, please cite the following paper and give us a star.

@misc{wang2023structural,
      title={Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception}, 
      author={Xiao Wang and Wentao Wu and Chenglong Li and Zhicheng Zhao and Zhe Chen and Yukai Shi and Jin Tang},
      year={2023},
      eprint={2312.09812},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

if you have any problems with this work, please leave an issue.

About

[AAAI-2024] Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception, Xiao Wang, Wentao Wu, Chenglong Li, Zhicheng Zhao, Zhe Chen, Yukai Shi, Jin Tang

Languages

Language:Python 100.0%