How to change the features obtained by the clip encoder[1, 512]
lwtgithublwt opened this issue · comments
lwtgithublwt commented
What exactly does the [1,512] feature obtained by the clip encoder mean, and how does it become a lattice of channels, length, and width?