LTH14 / mage

A PyTorch implementation of MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

More information about VQGAN

gaopengpjlab opened this issue · comments

Can you release pretrained VQGAN with more parameters and high resolution? By the way, can you share the FID score the your pretrained VQGAN?

We do not use VQGANs with different architectures in our paper, and the VQGAN is trained on ImageNet256x256. You can get the pre-trained VQGAN here. The FID of the VQGAN's reconstructed images is around 3.

Can you release stronger tokenizer and high-resolution tokenizer ?

Such as VQGAN with ImageNet 512 * 512?

The MAGE paper only contains results on ImageNet 256x256. We do not train a tokenizer with a resolution of 512x512.

For the "stronger" tokenizer, can you specify which one you refer to? We only have two tokenizers. One is trained with "strong augmentation", which is the tokenizer we used in this repo. The other one is trained with "weak augmentation". That one is in JAX and you can get the pre-trained weights here.

Stronger means tokenizer with a resolution of 512x512. I am planning to scale MAEG to larger resolution, namely, 512x512.

I see. Unfortunately we do not train a tokenizer on 512x512. You can also check MaskGIT. In the MaskGIT repo, they release a tokenizer trained on ImageNet 512x512.

Thank you so much for your kind reply.

The FID of the VQGAN's reconstructed images is around 3.

The FID score is reported on ImagenNet train split or val split?

All FID scores in the paper are reported on ImageNet val split.

Thank you very much. I asked this question because original VQGAN report both train and val FID score.

No worries