Why do not use clip image and text encoder?

Question

Why do not use clip image and text encoder?

youngstu opened this issue a year ago · comments

Why do not use clip image and text encoder? In Stable diffusion, text encoder used clip text encoder, and freeze condtion model while training diffusion. But in SDFusion, text encoder do not use clip-text-encoder and text-encoder params do not freeze.

Could you explain it ? Thanks.