Image encoder

Question

Image encoder

typercast opened this issue 3 years ago · comments

Is it possible to use a pre-trained image model from Hugging Face when trying to fine-tune? The latest models are usually there, so it would be pretty cool if it was compatible.

Cade Gordon · Answer 1 · Thu Jan 13 2022 09:42:21 GMT+0800 (China Standard Time)

It should be so long as inference functions like any a normal nn.Module. Give it a try and alter the final embedding layer to be the same as your text encoder and tell me how it goes!

I'll reopen this if you run into any problems. :)