Diversity-Aware Meta Visual Prompting (CVPR 2023)

This repository provides the official PyTorch implementation of the following conference paper:

Diversity-Aware Meta Visual Prompting (CVPR 2023)
Qidong Huang¹, Xiaoyi Dong¹, Dongdong Chen², Weiming Zhang¹, Feifei Wang¹, Gang Hua³, Nenghai Yu¹
¹University of Science and Technology of China, ²Microsoft Cloud AI, ³Wormpex AI Research

Environment Setup

This code is tested with Python3.8, Pytorch = 1.11 and CUDA = 11.3, requiring the following dependencies:

timm = 0.4.9
lpips = 0.1.4
opencv-python = 4.6.0.66

To setup a conda environment, please use the following instructions:

conda env create -f environment.yaml
conda activate dam_vp

Dataset Preparation

The Fine-Grained Visual Classification (FGVC) datasets can be downloaded in VPT repo. The Fru92 and Veg200 datasets can be downloaded at VegFru. Other datasets are all avaliable at torchvision.

(Optional) To prepare the datasets of Visual Task Adaptation Benchmark (VTAB) benchmark, you can install the tensorflow package as in VPT repo and run the command below:

python data_utils/vtab_prep.py

For more tips about how to download VTAB-1k, please refer to VTAB_SETUP.md.

The overall directory structure should be:

│DAM-VP/
├──data/
│   ├──FGVC/
│   │   ├──CUB_200_2011/
│   │   ├──OxfordFlower/
│   │   ├──Stanford-cars/
│   │   ├──Stanford-dogs/
│   │   ├──nabirds/
│   ├──VTAB/
│   │   ├──.......
│   ├──finegrained_dataset/
│   │   ├──vegfru-dataset/
│   ├──torchvision_dataset/
│   │   ├──.......
├──.......

Pre-trained Model Preparation

The used pre-trained vision models are detailed in Table 8 of our paper. Their checkpoints can be downloaded here:

Backbone	Pre-trained Objective	Pre-trained Dataset	Download	md5sum
ViT-B/16	Supervised	ImageNet-1k	Download	`-`
ViT-B/16	Supervised	ImageNet-22k	Download	`-`
ViT-B/16	CLIP	400M Web Data	Download	`-`
Swin-B	Supervised	ImageNet-22k	Download	`bf9cc1`
ViT-B/16	MoCo v3	ImageNet-1k	Download	`-`
ResNet-50	Supervised	ImageNet-1k	Download	`-`

Meta Prompt Initialization

The trained meta prompts are available at here, you can directly download these checkpoints and store them at ./meta-training/checkpoints/. Also, you can implement the meta training of visual prompts by yourself. The following instructions will be helpful.

For head-freezing/missing scenario, please run the command:

cd meta-training/
# if prompting on vit-b-1k
python main_hf.py --base_dir /your/path/to/dataset/ --pretrained_model vit-b-1k --meta_lr 0.5 --update_lr 0.5 --update_step 4 --meta_step_size 0.5 --test_dataset oxford-flowers
# if prompting on clip-vit-b
python main_clip.py --base_dir /your/path/to/dataset/  --pretrained_model clip-vit-b --meta_lr 1.0 --update_lr 1.0 --update_step 4 --meta_step_size 0.5

For head-tuning scenario, please run the command:

cd meta-training/
# if prompting on vit-b-22k
python main_ht.py --base_dir /your/path/to/dataset/ --pretrained_model vit-b-22k --meta_lr 1.0 --update_lr 1.0 --update_step 4 --meta_step_size 0.5 --weight_decay 1e-4  --test_dataset oxford-flowers
# if prompting on swin-b-22k
python main_ht.py --base_dir /your/path/to/dataset/ --pretrained_model swin-b-22k --meta_lr 0.5 --update_lr 0.5 --update_step 4 --meta_step_size 0.5 --weight_decay 1e-4
# if prompting on moco-v3-b-1k
python main_ht.py --base_dir /your/path/to/dataset/ --pretrained_model moco-v3-b-1k --meta_lr 0.5 --update_lr 0.5 --update_step 4 --meta_step_size 0.5 --weight_decay 1e-4
# if prompting on resnet50-1k
python main_ht.py --base_dir /your/path/to/dataset/ --pretrained_model resnet50-1k --meta_lr 0.5 --update_lr 0.5 --update_step 4 --meta_step_size 0.5 --weight_decay 1e-4

Diversity-Aware Prompting

With the meta trained visual prompt, we can adapt pretrained vision models to unseen vision datasets. The hyper-parameter configurations can be found in Table 13 and Table 14 of our paper.

For head-freezing/missing scenario, please run the command:

cd task_adapting/
# if prompting on vit-b-1k
python main.py --base_dir /your/path/to/dataset/ --pretrained_model vit-b-1k --adapt_method prompt_wo_head --test_dataset /select/one/dataset/ --epochs 50 --lr /learning/rate/ --weight_decay /weight/decay/rate/ --checkpoint_dir ../meta-training/checkpoints/vit-b-1k-wo-head.pth
# if prompting on clip-vit-b
python main_clip.py --base_dir /your/path/to/dataset/ --pretrained_model clip-vit-b --adapt_method prompt_wo_head --test_dataset /select/one/dataset/ --epochs 50 --lr /learning/rate/ --weight_decay /weight/decay/rate/ --checkpoint_dir ../meta-training/checkpoints/clip-vit-b-wo-head.pth

For head-tuning scenario, please run the command:

cd task_adapting/
# if prompting on vit-b-22k
python main.py --base_dir /your/path/to/dataset/ --pretrained_model vit-b-22k --adapt_method ours_with_head --test_dataset /select/one/dataset/ --epochs 50 --lr /learning/rate/ --weight_decay /weight/decay/rate/ --checkpoint_dir ../meta-training/checkpoints/vit-b-22k-w-head.pth
# if prompting on swin-b-22k
python main.py --base_dir /your/path/to/dataset/ --pretrained_model swin-b-22k --adapt_method ours_with_head --test_dataset /select/one/dataset/ --epochs 50 --lr /learning/rate/ --weight_decay /weight/decay/rate/ --checkpoint_dir ../meta-training/checkpoints/swin-b-22k-w-head.pth
# if prompting on moco-v3-b-1k
python main.py --base_dir /your/path/to/dataset/ --pretrained_model moco-v3-b-1k --adapt_method ours_with_head --test_dataset /select/one/dataset/ --epochs 50 --lr /learning/rate/ --weight_decay /weight/decay/rate/ --checkpoint_dir ../meta-training/checkpoints/moco-v3-b-1k-w-head.pth
# if prompting on resnet50-1k
python main.py --base_dir /your/path/to/dataset/ --pretrained_model resnet50-1k --adapt_method ours_with_head --test_dataset /select/one/dataset/ --epochs 50 --lr /learning/rate/ --weight_decay /weight/decay/rate/ --checkpoint_dir ../meta-training/checkpoints/resnet50-1k-w-head.pth

Acknowledgement

This repo is partially based on VP and VPT. Thanks for their impressive works!

Citation

If you find this work useful for your research, please cite our paper:

@inproceedings{huang2023damvp,
  title={Diversity-Aware Meta Visual Prompting},
  author={Qidong Huang and Xiaoyi Dong and Dongdong Chen and Weiming Zhang and Feifei Wang and Gang Hua and Nenghai Yu},
  journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

License

The code is released under MIT License (see LICENSE file for details).

liviust / DAM-VP