What is the image input for inference?

Question

SenHe opened this issue 2 years ago · comments

Thanks for this great work!

After going through the code, I got some questions.

In the first stage of training discrete VAE, we already trained a code book. Why we don't use it for second stage training but initialize a new code book for images.
During training, we use the original image as input. During inference, how to set the image input? Is it a random noise with size 3x256x256? How do we do the casual attention in transformer for inference?

Ronghuan Wu · Answer 1 · Sun Sep 04 2022 23:02:07 GMT+0800 (China Standard Time)

After reading codes, I also don't know why there is another new code book. Do you have any idea now?