lucidrains / DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to improve image fidelity?

soon-yau opened this issue · comments

brown
(Left:generated. Right: from dataset)
I have trained on mannequin dataset and the results look quite good. However, the generated images are a bit blurry and fine details are lost. Therefore, I wonder what changes do I need to do make them look crispier.

I currently use VQGAN pretrained on imagenet. I have also tried to train a VAE from scratch (using default train_vae.py) but it is blurry. I tried increasing the number of layers, number of tokens etc but didn't see improvement and made it a bit more unstable. Any advise on what VAE parameters to change?

Managed to train a mannequin-only model with vqgan_gumbel_f8_8192 that produces quality matching OpenAI.

mannequin_female_2

commented

@soon-yau Hi, I am trying to train a model using vqgan_gumbel_f8_8192 as you do, but I get terrible results. What model parameters did you use to get this result, especially the number of layers and dimensions? Also, could you please provide a link to the "mannequin dataset"? Thanks