kakaobrain / rq-vae-transformer

The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

disppointed, the result is too poor, all your demos comes from the training set?

world2vec opened this issue · comments

Hi,
just play cc3m_cc12m_yfcc with your notebook, the result of the simple text 'a man with black glass' is so poor:

a man with black glass_temp_1 0_top_k_1024_top_p_0 95

No the demo samples are not from training set, but all generated.
You can check some examples, which are generated by other people, such as https://twitter.com/multimodalart/status/1513947558913187843?s=21&t=Ofu8oiHTE5_3keSX_IDAiQ on Twitter.

You can adjust the topk and topk parameters according to the text prompt, and the performance could be different according to the texts, since 3.9B params model trained on 30M is still smaller size than DALL-E.