Quality of the generated images

Question

Quality of the generated images

stsavian opened this issue a year ago · comments

Dear @cloneofsimo and @SilenceMonk, thanks so much for this code! It is beneficial and precisely the missing piece I need to understand diffusion models better. Also, I appreciated that I could just run the CIFAR10 training without any code modification.
I am playing around with your code to better understand the guided_diffusion repository, which I find too complex and I need to simplify.

I have trained on cifar10 and obtained the following results after 100 epochs.

As you can see, the prediction quality seems quite far from the ground truth.
I plan to extend your code to images with a larger resolution, however, I am hesitant now, as I do not understand if the network is learning or not. I would like to extend the code while maintaining convergence.

i) Is this behavior normal? Is there some critical hyperameter to tune to obtain clearer images?

UPDATE: I have trained on celebA and obtained the following results after 21 epochs (approx 14 hours on a 3090):

The celebA results seem already better than the cifar10, but I might need more training epochs because the generated images are still far from the groundtruth.

Still referring to the celebA results, you can see in the following image that the generated images could show a constant color,
background.
(below you can see celebA after 19 epochs)

This issue is similar to openai/guided-diffusion#81 .

Furthermore, you can see that the training does not progress linearly, if we take epoch 22 of celebA, we can notice that the network outputs smooth predictions with no structure again.

So overall I am not getting the training stability I was expecting. These results are (unfortunately) consistent with my issues for the guided_diffusion repository openai/guided-diffusion#42 .

iii) do you have any comment which could help overcame this issue?

Thanks again for your help!
Stefano

Simo Ryu · Answer 1 · Sat Feb 18 2023 04:39:13 GMT+0800 (China Standard Time)

There are much more considerations to make when implementing these results for large scale dataset, like CelebA.

Please don't expect it to work stably anything beyond MNIST and perhaps CIFAR10 examples. I'm wondering why CIFAR10 didn't work for you... you should be able to reproduce the results in the readme...

Stuff that you should probably do to get this code to work beyond toy dataset:

Better Optimizer + batchsize + noise scheduler
Better model, probably one that isn't as naïve as this one.
Loss reweight, with timestep distribution

stsavian · Answer 2 · Mon Feb 20 2023 20:28:29 GMT+0800 (China Standard Time)

@cloneofsimo thanks so much for your advice!

Paula Ceccon · Answer 3 · Sat Jul 08 2023 13:15:58 GMT+0800 (China Standard Time)

I couldn't reproduce the results for CIFAR10. Strangely, running the code from this repo gives me only noise images (trying to find the issue). Given that it works for MNIST, and the only different is the UNet, I'd guess it might be something there...