zero-shot or fine-tune？

Question

jingzhengli opened this issue 2 years ago · comments

To my knowledge, CLIP can be directly used applied to zero-shot learning (i.e., unseen/novel classes).
coop and cocoop don't appear to be zero-shot learning, but require fine-tuning. However, I don't see the detials about how to fine-tuning in paper. Am I misunderstand it? In the meantime, I would like to know how the CLIP is fine-tuned.
I cannot understand the figure 1 in paper: why the performance of coop and cocoop can be compared to zero-shot learning.

Jingzheng Li · Answer 1 · Tue Sep 13 2022 10:23:53 GMT+0800 (China Standard Time)

Thanks for great work.
I understood