openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to change the features obtained by the clip encoder[1, 512]

lwtgithublwt opened this issue · comments

What exactly does the [1,512] feature obtained by the clip encoder mean, and how does it become a lattice of channels, length, and width?