openai / glide-text2im

GLIDE: a diffusion-based text-conditional image synthesis model

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About CLIP training on nosied images

yufeng9819 opened this issue · comments

Hey! I think GLIDE is a wonderful work. But I have a question about CLIP training on nosied images.

I want to know why CLIP can be trained on nosied images. I think if t (range from 0 to 1000) is large(maybe close to 500 or more), then the noised images hardly contain any semantic information. In this case, I want to know CLIP model how to encode similar features from noised images and text and I also think it may cause model to not converge (because it is hard to encode similar features between noised images and text)