Zasder3 / train-CLIP

A PyTorch Lightning solution to training OpenAI's CLIP from scratch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Image encoder

typercast opened this issue · comments

Is it possible to use a pre-trained image model from Hugging Face when trying to fine-tune? The latest models are usually there, so it would be pretty cool if it was compatible.

It should be so long as inference functions like any a normal nn.Module. Give it a try and alter the final embedding layer to be the same as your text encoder and tell me how it goes!

I'll reopen this if you run into any problems. :)