Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach (CVPR 2024)

Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal, James J. Little

This repository contains source code for our CVPR 2024 paper titled, Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach.

🎬 Getting Started

1️⃣ Requirements

We used Python 3.9.0 in our experiments and the list of packages is available in the requirements.txt file. You can install them using pip install -r requirements.txt.

Setting up CUDA kernel for MSDeformAttn

After preparing the required environment, run the following command to compile CUDA kernel for MSDeformAttn:

cd VisualPromptGFSS/src/model/ops/
sh make.sh

2️⃣ Dataset

We used the versions of PASCAL and MS-COCO provided by DIaM. You can download the dataset from here.

The data folder should look like this:

data
├── coco
│   ├── annotations
│   ├── train
│   ├── train2014
│   ├── val
│   └── val2014
└── pascal
|   ├── JPEGImages
|   └── SegmentationClassAug

The train/val split

The train/val split can be found in the diectory src/lists/. We borrowed the list from https://github.com/Jia-Research-Lab/PFENet.

3️⃣ Download pre-trained base models

Please download our pre-trained base models from this google drive link. Please place the initmodel directory at the src/ directory of this repo. It contains the pre-trained resnet model. The directories coco and pascal contains pre-trained base models for different splits of coco-20i and pascal-5i

🗺 Overview of the repo

Default configuration files can be found in config/. The directory src/lists/ contains the train/val splits for each dataset. All the codes are provided in src/.

⚙ Training The Base (To be updated soon)

Due to time constraints, we have not yet uploaded the source code for training the base model. However, we plan to release it soon. Our primary contribution lies in few-shot optimization and inference, which is why we prioritized releasing that part first. For training the base model, we employed a standard per-pixel cross-entropy loss.

🧪 Few-shot fine-tuning

For inductive fine-tuning, please modify the coco_m2former.yaml or pascal_m2former.yaml (depending on the dataset you want to run inference). Please specify the split and numer of shots you want to evaluate on in the evaluate file, along-with the pre-trained model.

For transductive fine-tuning, please modify the coco_m2former_transduction.yaml or pascal_m2former_transduction.yaml in similar manner for the split and number of shots you want to evaluate on.

To run few-shot inference run first go to src/ directory and execute any of the following commands for inference:

python3 test_m2former.py --config ../config/pascal_m2former.yaml  --opts  pi_estimation_strategy self  n_runs 5 gpus [0]  # For pascal inductive inference
python3 test_m2former.py --config ../config/coco_m2former.yaml  --opts  pi_estimation_strategy self  n_runs 5 gpus [0]  # For coco inductive inference
python3 test_m2former.py --config ../config/pascal_m2former_transduction.yaml  --opts  pi_estimation_strategy self  n_runs 5 gpus [0]  # For pascal transductive inference
python3 test_m2former.py --config ../config/coco_m2former_transduction.yaml  --opts  pi_estimation_strategy self  n_runs 5 gpus [0]  # For coco transductive inference

🙏 Acknowledgments

We thank the authors of DIaM and Mask2Former from which some parts of our code are inspired.

📚 Citation

If you find this project useful, please consider citing:

@inproceedings{hossain2024visual,
  title={Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach},
  author={Hossain, Mir Rayat Imtiaz and Siam, Mennatullah and Sigal, Leonid and Little, James J},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={23470--23480},
  year={2024}
}

rayat137 / VisualPromptGFSS