Question on using this in conjunction with CLIP / Open_CLIP / VQGAN

Question

Question on using this in conjunction with CLIP / Open_CLIP / VQGAN

johndpope opened this issue 3 years ago · comments

@gabrielilharco suggested this research may help with my problem above.
Basically want to introspect an image to shed light on what prompts would be appropriate to recreate a similar image.

Robert L. Logan IV · Answer 1 · Sat Aug 07 2021 01:36:23 GMT+0800 (China Standard Time)

Hi @johndpope,

This is an interesting idea! No guarantees, but this definitely could work, although you will probably get better prompts with multiple query (not sure this is the right word) images.

I think the best way to use AutoPrompt for your application would be to copy the relevant lines of code to the open_clip training script. Pretty much everything you need is contained in https://github.com/ucinlp/autoprompt/blob/master/autoprompt/create_trigger.py. The main things you'll need are:

The GradientStorage object that registers the backwards hook to store the gradients of the loss w.r.t. the individual prompt tokens.
The hotflip_attack function to find the updates.
Something in your training loop that approximates these lines. Basically the steps are: 1. measure the loss, 2. get candidate prompt modifications, 3. use some additional training data to check which candidate is the best. We found step 3 was necessary to getting our prompts to generalize which is why I recommend having multiple query images, but maybe this isn't needed for your application...IDK.

Hope this helps! I'll leave this issue open for a couple days in case you have any follow up questions.

Best,

@rloganiv

Robert L. Logan IV · Answer 2 · Sat Aug 07 2021 01:39:31 GMT+0800 (China Standard Time)

Oh and to avoid licensing issues I just licensed this code base under Apache 2.0, so you should be free to copy and alter the code for open_clip however you see fit.

John D. Pope · Answer 3 · Sat Aug 07 2021 05:15:10 GMT+0800 (China Standard Time)

Thanks Robert,
going to be deferring to this repo by Jamie Kiros @dzryk
https://github.com/dzryk/clip-grams