RenShuhuai-Andy / Awesome_Prompting_Papers_in_Computer_Vision

A curated list of prompt-based paper in computer vision and vision-language learning.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome Prompting Papers in Computer Vision

Introduction

A curated list of prompt-based papers in computer vision and vision-language learning.

Keywords:

  • Task tag, e.g.,
  • Abbreviation tag, e.g.,
  • Characteristic tag: Some characteristic makes this paper unique, e.g.,
  • Bold font: We highlight some pilot work that may contribute to the prevalence of visual prompting.

Prompt Learning

This section contains papers designing prompt (containing adapter) modules for parameter-efficient adaptation of foundation models.

Vision Prompt

  • Learning to Prompt for Continual Learning [pdf] [code]

    CVPR 2022

  • Visual Prompt Tuning [pdf] [code]

    ECCV 2022

  • Exploring Visual Prompts for Adapting Large-Scale Models [pdf] [code]

    arXiv 2022/03

  • DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning [pdf] [code]

    ECCV 2022

  • AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition [pdf] [code]

    NeurIPS 2022

  • Vision Transformer Adapter for Dense Predictions [pdf] [code]

    arXiv 2022/05

  • Neural Prompt Search [pdf] [code]

    arXiv 2022/06

  • Convolutional Bypasses Are Better Vision Transformer Adapters [pdf] [code]

    arXiv 2022/07

  • Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets [pdf]

    arXiv 2022/08

  • Prompt Vision Transformer for Domain Generalization [pdf]

    arXiv 2022/08

  • Prompt-Matched Semantic Segmentation [pdf]

    arXiv 2022/08

  • Visual Prompting via Image Inpainting [pdf] [code]

    NeurIPS 2022

  • Visual Prompt Tuning for Test-time Domain Adaptation [pdf]

    arXiv 2022/10

  • Visual Prompting for Adversarial Robustness [pdf]

    arXiv 2022/10

  • Prompt Generation Networks for Efficient Adaptation of Frozen Vision Transformers [pdf] [code]

    arXiv 2022/10

  • Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning [pdf] [code]

    NeurIPS 2022

  • Towards a Unified View on Visual Parameter-Efficient Transfer Learning [pdf] [code]

    arXiv 2022/10

  • Multitask Vision-Language Prompt Tuning [pdf] [code]

    arXiv 2022/11

Vision-Language Prompt

  • Learning Transferable Visual Models From Natural Language Supervision [pdf] [code]

    ICML 2021

  • Learning to Prompt for Vision-Language Models [pdf] [code]

    IJCV 2022

  • Prompt Distribution Learning [pdf]

    CVPR 2022

  • Conditional Prompt Learning for Vision-Language Models [pdf] [code]

    CVPR 2022

  • DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting [pdf] [code]

    CVPR 2022

  • Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos [pdf] [code]

    CVPR 2022

  • PointCLIP: Point Cloud Understanding by CLIP [pdf] [code]

    CVPR 2022

  • VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks [pdf] [code]

    CVPR 2022

  • Can Language Understand Depth? [pdf] [code]

    ACM MM 2022

  • Prompting for Multi-Modal Tracking [pdf]

    ACM MM 2022

  • Expanding Language-Image Pretrained Models for General Video Recognition [pdf] [code]

    ECCV 2022

  • Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification [pdf] [code]

    ECCV 2022

  • Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition [pdf] [code]

    arXiv 2023/04 task task task

  • Colorful Prompt Tuning for Pre-trained Vision-Language Models [pdf]

    arXiv 2021/08

  • ActionCLIP: A New Paradigm for Video Action Recognition [pdf] [code]

    arXiv 2021/09

  • CLIP-Adapter: Better Vision-Language Models with Feature Adapters [pdf] [code]

    arXiv 2021/10

  • Amortized Prompt: Lightweight Fine-Tuning for CLIP in Domain Generalization [pdf]

    arXiv 2021/11

  • Prompting Visual-Language Models for Efficient Video Understanding [pdf] [code]

    arXiv 2021/12 task task task

  • Unsupervised Prompt Learning for Vision-Language Models [pdf] [code]

    arXiv 2022/04 task

  • Prompt-aligned Gradient for Prompt Tuning [pdf] [code]

    arXiv 2022/05

  • Parameter-Efficient Image-to-Video Transfer Learning [pdf]

    arXiv 2022/06 task

  • DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations [pdf]

    arXiv 2022/06 task

  • Delving into the Openness of CLIP [pdf] [code]

    ACL 2023 task

  • OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression [pdf]

    NeurIPS 2022

  • Prompt Tuning for Generative Multimodal Pretrained Models [pdf] [code]

    arXiv 2022/06

  • Prompt Tuning with Soft Context Sharing for Vision-Language Models [pdf]

    arXiv 2022/08

  • Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models [pdf] [code]

    NeurIPS 2022

  • CPL: Counterfactual Prompt Learning for Vision and Language Models [pdf] [code]

    arXiv 2022/10

  • Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models [pdf] [code]

    arXiv 2022/10

  • Unified Vision and Language Prompt Learning [pdf]

    arXiv 2022/10

  • MaPLe: Multi-modal Prompt Learning [pdf] [code]

    arXiv 2022/10

  • Multi-Prompt Alignment for Multi-source Unsupervised Domain Adaptation [pdf]

    arXiv 2022/10

Language-Interactable Prompt

Language-interactable prompter develops few/zero-shot capabilities by prompting one/several independent foundational models (VLMs, LMs, VMs, etc.) with the language interface.

  • Multimodal Few-Shot Learning with Frozen Language Models [pdf]

    NeurIPS 2021

  • An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA [pdf] [code]

    AAAI 2022

  • A Good Prompt Is Worth Millions of Parameters? Low-resource Prompt-based Learning for Vision-Language Models [pdf]

    ACL 2022

  • VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning [pdf] [code]

    CVPR 2022

  • ClipCap: CLIP Prefix for Image Captioning [pdf] [code]

    arXiv 2021/11

  • Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language [pdf] [code]

    arXiv 2022/04

  • Flamingo: a Visual Language Model for Few-Shot Learning [pdf]

    arXiv 2022/04

  • Language Models Can See: Plugging Visual Controls in Text Generation [pdf] [code]

    arXiv 2022/05

  • Zero-Shot Video Question Answering via Frozen Bidirectional Language Models [pdf]

    arXiv 2022/06

  • Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning [pdf]

    arXiv 2022/06

Application of Prompt

This section contains awesome papers using the prompt module as tools, like papers using prompts for pretraining or specific applications.

  • Unifying Vision-and-Language Tasks via Text Generation [pdf] [code]

    ICML 2021

  • StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [pdf] [code]

    ICCV 2021

  • Grounded Language-Image Pre-training [pdf] [code]

    CVPR 2022

  • Align and Prompt: Video-and-Language Pre-training with Entity Prompts [pdf] [code]

    CVPR 2022

  • GroupViT: Semantic Segmentation Emerges from Text Supervision [pdf] [code]

    CVPR 2022

  • Unified Multimodal Pretraining and Prompt-based Tuning for Vision-Language Understanding and Generation [pdf]

    arXiv 2021/12

  • Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning [pdf]

    arXiv 2022/08

Other Resources

  • PromptPapers: A comprehensive curated list for prompting papers (mainly in natural language processing)
  • Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing [pdf] arXiv 2021/07

About

A curated list of prompt-based paper in computer vision and vision-language learning.