cloneofsimo / minDiffusion

Self-contained, minimalistic implementation of diffusion models with Pytorch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Quality of the generated images

stsavian opened this issue · comments

Dear @cloneofsimo and @SilenceMonk, thanks so much for this code! It is beneficial and precisely the missing piece I need to understand diffusion models better. Also, I appreciated that I could just run the CIFAR10 training without any code modification.
I am playing around with your code to better understand the guided_diffusion repository, which I find too complex and I need to simplify.

I have trained on cifar10 and obtained the following results after 100 epochs.
ddpm_sample_cifar99

As you can see, the prediction quality seems quite far from the ground truth.
I plan to extend your code to images with a larger resolution, however, I am hesitant now, as I do not understand if the network is learning or not. I would like to extend the code while maintaining convergence.

i) Is this behavior normal? Is there some critical hyperameter to tune to obtain clearer images?

UPDATE: I have trained on celebA and obtained the following results after 21 epochs (approx 14 hours on a 3090):
ddpm_sample_celeba021
The celebA results seem already better than the cifar10, but I might need more training epochs because the generated images are still far from the groundtruth.

Still referring to the celebA results, you can see in the following image that the generated images could show a constant color,
background.
(below you can see celebA after 19 epochs)
ddpm_sample_celeba020
This issue is similar to openai/guided-diffusion#81 .

Furthermore, you can see that the training does not progress linearly, if we take epoch 22 of celebA, we can notice that the network outputs smooth predictions with no structure again.

ddpm_sample_celeba022

So overall I am not getting the training stability I was expecting. These results are (unfortunately) consistent with my issues for the guided_diffusion repository openai/guided-diffusion#42 .

iii) do you have any comment which could help overcame this issue?

Thanks again for your help!
Stefano

There are much more considerations to make when implementing these results for large scale dataset, like CelebA.

Please don't expect it to work stably anything beyond MNIST and perhaps CIFAR10 examples. I'm wondering why CIFAR10 didn't work for you... you should be able to reproduce the results in the readme...

Stuff that you should probably do to get this code to work beyond toy dataset:

  • Better Optimizer + batchsize + noise scheduler
  • Better model, probably one that isn't as naïve as this one.
  • Loss reweight, with timestep distribution

@cloneofsimo thanks so much for your advice!

I couldn't reproduce the results for CIFAR10. Strangely, running the code from this repo gives me only noise images (trying to find the issue). Given that it works for MNIST, and the only different is the UNet, I'd guess it might be something there...

download
ddpm_sample_cifar0