Why do not use clip image and text encoder?
youngstu opened this issue · comments
Why do not use clip image and text encoder? In Stable diffusion, text encoder used clip text encoder, and freeze condtion model while training diffusion. But in SDFusion, text encoder do not use clip-text-encoder and text-encoder params do not freeze.
Could you explain it ? Thanks.