naturomics / DLF

Code for reproducing results in "Generative Model with Dynamic Linear Flow"

Home Page:https://arxiv.org/abs/1905.03239

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Imagenet dataset

yang-song opened this issue · comments

Could you double check that your small imagenet datasets are the same as http://image-net.org/small/train_32x32.tar, and http://image-net.org/small/valid_32x32.tar? As far as I know different preprocessing on imagenet can greatly affect the likelihood you get. For example, using the imagenet 32x32 dataset from https://patrykchrabaszcz.github.io/Imagenet32/ easily yields a bpd around 3.80 for flow models. It is weird that your model has such a big advantage on imagenet, but not on CIFAR-10

Could you double check that your small imagenet datasets are the same as http://image-net.org/small/train_32x32.tar, and http://image-net.org/small/valid_32x32.tar?

Yes, it's from there. I used the scripts from Glow repo to generate tfrecord files, therefore the imagenet dataset is same as in Glow.

using the imagenet 32x32 dataset from https://patrykchrabaszcz.github.io/Imagenet32/ easily yields a bpd around 3.80

How many iterations you run? 3.80 bits/dim is reasonable. You might have noticed, our results in the paper is reported within 50 epochs, it's not fully converged (I suddenly realized that I am so poor, I'm looking for an offer from those who can support me with hundreds GPUs, wow). It's welcome and thankful if you have available GPUs to train it with more iterations on the same dataset and feedback the results.

It is weird that your model has such a big advantage on imagenet, but not on CIFAR-10

Yes, it's weird. It's easy to get overfitting on CIFAR-10, we tried with smaller model or regularization but no improvement. We think it is because 1) CIFAR-10 is more blurry and less samples compared to imagenet 32x32; 2) As discussed in our paper, dynamic linear transformation learns to predict mu, scale for each input, which indicates we might need more data to cover the distribution of datasets.

different preprocessing on imagenet can greatly affect the likelihood you get

Confirmed, you're right. I tested it on CelebA 256x256 and ImageNet 32x32 with this repo, by downsampling them (to 64x64 and 16x16, respectively) with different methods of tf.image.resize_image API, and observed about 0.5 bits/dim gap between different downsampling methods.

Our results reported in the paper were tested with the same preprocessing as in Glow, so it's no problem.