Question on how it actually works

Question

Question on how it actually works

FerryHuang opened this issue a year ago · comments

Thanks for developing such a great extension! I'm just curious about whether it's in fact a img2img process like the sampling starting from the input image to the latents and finally to the output image, so the XA performs on the latents generated by the input image and the input words?