BUG: weights inside CLIP_L(...) don't parse correctly
asagi4 opened this issue · comments
Thank you!
clip_l
and clip_g
gets torch concatenated as seen in https://github.com/comfyanonymous/ComfyUI/blob/6d281b4ff4ad3918a4f3b4ca4a8b547a2ba3bf80/comfy/sdxl_clip.py#L52-L56
so which weights do you use for the final conditioning?
@mizukarada as far as I can tell that encoding happens outside anything my nodes touch.
The logic is the same as in CLIPTextEncodeSDXL
The node just encodes the l and g tokens and then calls clip.encode_from_tokens
(or its equivalent from ADV_CLIP_emb if that's in use). What that does is up to ComfyUI.
@asagi4 For example,
clip_l
: cat AND dog :1.2
clip_g
: apple AND orange :1.3
clip_l
and clip_g
gets merged into one we'll call emb
what is emb
's weight? Is it 1.2
or 1.3
? or the average of the two? mind you, there can also be multiple weights in one text.
@mizukarada I don't think you can do that with my nodes; AND combining of prompt is processed after any clip_l / clip_g separation, since the l / g distinction happens at the token level and disappears once they've been encoded into tensors (though there's the "pooled" vs. non-pooled tensors, but my nodes basically treat them identically)
AND inside the CLIP_L function doesn't make sense. CLIP_L(foo AND bar)
will essentially parse CLIP_L(foo
and bar)
as two prompts