OpenGVLab / LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

why not use self.clip.transformer when training?

adda1221 opened this issue · comments

Hi, thank you for your exciting work! I noticed that you use self.clip.transformer for processing images during the inference stage. However, during the training stage, image processing is accomplished using torchvision. Are there any differences between these two methods? Thanks for your reply!

Hi @adda1221 , we do not use clip.transformer (the text encoder of clip) during inference. In the training stage, we use simple torchvision transforms which is the same as CLIP's transforms