How to change the features obtained by the clip encoder[1, 512]

Question

lwtgithublwt opened this issue 13 days ago · comments

What exactly does the [1,512] feature obtained by the clip encoder mean, and how does it become a lattice of channels, length, and width？