jychoi118 / ilvr_adm

ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about Training

XinYu-Andy opened this issue · comments

commented

Hi!
Thanks for your work, it's really interesting. I am trying to train the diffusion model from the scratch, however, it seems the FID is unstable during training if I use the default setting of guided diffusion. I suspect that this may be caused by the fixed learning rate (this repo seems to fix the learning rate as 1e-4 during training), so I am wondering if the learning rate has decreased during your training process?

I used learning rate 2e-5 for all datasets. I assume that training at 256x256 resolution is unstable with learning rate 1e-4.

commented

I used learning rate 2e-5 for all datasets. I assume that training at 256x256 resolution is unstable with learning rate 1e-4.

Thank you, did you use any learning rate scheduler or just keep it a constant during training?

I used constant learning rate. I did not try any learning rate scheduler.

commented

I used constant learning rate. I did not try any learning rate scheduler.

Thank you very much. BTW, can I ask how many training iterations are needed to produce "resonable" results on FFHQ? (I understand that you trained it for 1.2M iterations as the paper said) I have trained on this dataset for 400K iters currently, but it can only produces faces like this:
media_images_evaluate_sample_7_410000_322494068ca1bf2c5a45
media_images_evaluate_sample_7_412000_848123f7e7201907e930

The training progress depends on the number of images seen during training, which is batch_size x iteration. For faster convergence at high resolution, I recommend reweighting the training objective so that the model focus on learning at large t. You may refer to this paper and code.

At the time of writing ILVR paper, 1M iteration with batch size 8 produced good samples.

Hi! Thanks for your work, it's really interesting. I am trying to train the diffusion model from the scratch, however, it seems the FID is unstable during training if I use the default setting of guided diffusion. I suspect that this may be caused by the fixed learning rate (this repo seems to fix the learning rate as 1e-4 during training), so I am wondering if the learning rate has decreased during your training process?

Hi @jychoi118 @XinYu-Andy I trained a new model with my own dataset using the image_train.py, and the following hyperparameters:

python scripts/image_train.py --data_dir datasets/art --image_size 64 --num_channels 128 --num_res_blocks 1 --diffusion_steps 100 --noise_schedule linear --lr 1e-4 --batch_size 32

and the following command to run the sampler:

python scripts/ilvr_sample.py --attention_resolutions 16 --class_cond False --diffusion_steps 1000 --dropout 0.0 --image_size 64 --learn_sigma True --noise_schedule linear --num_channels 128 --num_head_channels 64 --num_res_blocks 1 --resblock_updown True --use_fp16 False --use_scale_shift_norm True --timestep_respacing 100 --model_path models/ema_0.9999_540000.pt --base_samples ref_imgs/war --down_N 32 --range_t 20 --save_dir output

But I am getting the following error:

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(

RuntimeError: Error(s) in loading state_dict for UNetModel:

Missing key(s) in state_dict: "input_blocks.4.0.in_layers.0.weight", "input_blocks.4.0.in_layers.0.bias", "input_blocks.4.0.in_layers.2.weight", "input_blocks.4.0.in_layers.2.bias",

One of the suggestions here is to increase the depth :
openai/guided-diffusion#7 (comment)
But I am not really sure if I understood what is meant and wether it will solve the problem.

Could you please help?

Maybe because the hyperparameters you used for training and sampling are different. Try modifying your sampling command by: --diffusion_steps 100 --resblock_updown False and remove --num_head_channels. You can check the default hyperparameters here.
By the way, I recommend using --diffusion_steps 1000 for training.

Thank you that worked, I also found your command in one of the other issues and I used it it worked. Hyperparameters were mismatched in my error.

Eventually these both worked

I used this command for training:
python scripts/image_train.py --data_dir datasets/art --attention_resolutions 16 --class_cond False --diffusion_steps 1000 --dropout 0.0 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 128 --num_res_blocks 1 --num_head_channels 64 --resblock_updown True --use_fp16 False --use_scale_shift_norm True

And this for sampling:

python scripts/ilvr_sample.py --attention_resolutions 16 --class_cond False --diffusion_steps 1000 --dropout 0.0 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 128 --num_head_channels 64 --num_res_blocks 1 --resblock_updown True --use_fp16 False --use_scale_shift_norm True --timestep_respacing 100 --model_path models/ema_0.9999_430000.pt --base_samples ref_imgs/war --down_N 32 --range_t 20 --save_dir output

The training progress depends on the number of images seen during training, which is batch_size x iteration. For faster convergence at high resolution, I recommend reweighting the training objective so that the model focus on learning at large t. You may refer to this paper and code.

At the time of writing ILVR paper, 1M iteration with batch size 8 produced good samples.

@jychoi118 Thanks for your explanation for the number of images seen during training! However, I wonder if the iteration you said in batch_size x iteration is equal to the step in log information when runs the training code (as shown as the picture).
messageImage_1702614524243