cheungdaven / prompt-pretraining

Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

"Scaling up prompt learning on ImageNet-21K achieves SOTA on 21 downstream datasets."

Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
Shuhuai Ren, Aston Zhang, Yi Zhu, Shuai Zhang, Shuai Zheng, Mu Li, Alex Smola, Xu Sun

paper

🚀 News

  • (Mar 22, 2023)
    • Codes for prompt pretraining (POMP) on ImageNet-21K, cross-dataset and cross-task evaluation.
    • Checkpoints of pre-trained POMP prompts, segmentation backbones, and detection backbones.

Highlights

main figure

Main Contributions

  1. We introduce a prompt pre-training method POMP, which fisrt enables prompt learning on large-scale datasets like ImageNet-21K with over twenty-thousand classes.
  2. POMP is memory and computation efficient. Compared with previous methods like CoOp, it achieves comparable accuracy on ImageNet-1K with only 19% GPU memory and 50% training time.
  3. POMP achieves new SOTAs on various open-vocabulary visual recognition datasets and tasks.

Installation

For installation and other package requirements, please follow the instructions detailed in INSTALL.md.

Data preparation

Please follow the instructions at DATASETS.md to prepare all datasets.

Pre-trained Models

Please follow the instructions at MODELS.md to prepare all pre-trained models.

Training and Evaluation

Please refer to the RUN.md for detailed instructions on training, evaluating and reproducing the results.


Contact

If you have any questions, please feel free to create an issue on this repository.

Acknowledgements

Our code is based on CoOp, MaPLe, Dassl, Detic and ZSSeg repositories. We thank the authors for releasing their code.

About

Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"

License:Apache License 2.0


Languages

Language:Python 89.2%Language:Shell 10.8%