Question about Training

Question

Question about Training

XinYu-Andy opened this issue 2 years ago · comments

Hi!
Thanks for your work, it's really interesting. I am trying to train the diffusion model from the scratch, however, it seems the FID is unstable during training if I use the default setting of guided diffusion. I suspect that this may be caused by the fixed learning rate (this repo seems to fix the learning rate as 1e-4 during training), so I am wondering if the learning rate has decreased during your training process?

Jooyoung Choi · Answer 1 · Mon May 02 2022 09:47:50 GMT+0800 (China Standard Time)

I used learning rate 2e-5 for all datasets. I assume that training at 256x256 resolution is unstable with learning rate 1e-4.

Andy · Answer 2 · Mon May 02 2022 10:34:23 GMT+0800 (China Standard Time)

I used learning rate 2e-5 for all datasets. I assume that training at 256x256 resolution is unstable with learning rate 1e-4.

Thank you, did you use any learning rate scheduler or just keep it a constant during training?

Jooyoung Choi · Answer 3 · Mon May 02 2022 10:58:45 GMT+0800 (China Standard Time)

I used constant learning rate. I did not try any learning rate scheduler.

Andy · Answer 4 · Thu May 19 2022 16:13:27 GMT+0800 (China Standard Time)

I used constant learning rate. I did not try any learning rate scheduler.

Thank you very much. BTW, can I ask how many training iterations are needed to produce "resonable" results on FFHQ? (I understand that you trained it for 1.2M iterations as the paper said) I have trained on this dataset for 400K iters currently, but it can only produces faces like this:

Jooyoung Choi · Answer 5 · Fri May 20 2022 09:30:50 GMT+0800 (China Standard Time)

The training progress depends on the number of images seen during training, which is batch_size x iteration. For faster convergence at high resolution, I recommend reweighting the training objective so that the model focus on learning at large t. You may refer to this paper and code.

At the time of writing ILVR paper, 1M iteration with batch size 8 produced good samples.

shahdghorsi · Answer 6 · Mon Aug 01 2022 02:08:37 GMT+0800 (China Standard Time)

Hi! Thanks for your work, it's really interesting. I am trying to train the diffusion model from the scratch, however, it seems the FID is unstable during training if I use the default setting of guided diffusion. I suspect that this may be caused by the fixed learning rate (this repo seems to fix the learning rate as 1e-4 during training), so I am wondering if the learning rate has decreased during your training process?

Hi @jychoi118 @XinYu-Andy I trained a new model with my own dataset using the image_train.py, and the following hyperparameters:

python scripts/image_train.py --data_dir datasets/art --image_size 64 --num_channels 128 --num_res_blocks 1 --diffusion_steps 100 --noise_schedule linear --lr 1e-4 --batch_size 32

and the following command to run the sampler:

python scripts/ilvr_sample.py --attention_resolutions 16 --class_cond False --diffusion_steps 1000 --dropout 0.0 --image_size 64 --learn_sigma True --noise_schedule linear --num_channels 128 --num_head_channels 64 --num_res_blocks 1 --resblock_updown True --use_fp16 False --use_scale_shift_norm True --timestep_respacing 100 --model_path models/ema_0.9999_540000.pt --base_samples ref_imgs/war --down_N 32 --range_t 20 --save_dir output

But I am getting the following error:

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(

RuntimeError: Error(s) in loading state_dict for UNetModel:

Missing key(s) in state_dict: "input_blocks.4.0.in_layers.0.weight", "input_blocks.4.0.in_layers.0.bias", "input_blocks.4.0.in_layers.2.weight", "input_blocks.4.0.in_layers.2.bias",

One of the suggestions here is to increase the depth :
openai/guided-diffusion#7 (comment)
But I am not really sure if I understood what is meant and wether it will solve the problem.

Could you please help?

Jooyoung Choi · Answer 7 · Tue Aug 02 2022 15:07:46 GMT+0800 (China Standard Time)

Maybe because the hyperparameters you used for training and sampling are different. Try modifying your sampling command by: --diffusion_steps 100 --resblock_updown False and remove --num_head_channels. You can check the default hyperparameters here.
By the way, I recommend using --diffusion_steps 1000 for training.

shahdghorsi · Answer 8 · Wed Aug 03 2022 03:55:42 GMT+0800 (China Standard Time)

Thank you that worked, I also found your command in one of the other issues and I used it it worked. Hyperparameters were mismatched in my error.

Eventually these both worked

I used this command for training:
python scripts/image_train.py --data_dir datasets/art --attention_resolutions 16 --class_cond False --diffusion_steps 1000 --dropout 0.0 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 128 --num_res_blocks 1 --num_head_channels 64 --resblock_updown True --use_fp16 False --use_scale_shift_norm True

And this for sampling:

python scripts/ilvr_sample.py --attention_resolutions 16 --class_cond False --diffusion_steps 1000 --dropout 0.0 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 128 --num_head_channels 64 --num_res_blocks 1 --resblock_updown True --use_fp16 False --use_scale_shift_norm True --timestep_respacing 100 --model_path models/ema_0.9999_430000.pt --base_samples ref_imgs/war --down_N 32 --range_t 20 --save_dir output

LinWeiJeff · Answer 9 · Fri Dec 15 2023 12:33:52 GMT+0800 (China Standard Time)

The training progress depends on the number of images seen during training, which is batch_size x iteration. For faster convergence at high resolution, I recommend reweighting the training objective so that the model focus on learning at large t. You may refer to this paper and code.

At the time of writing ILVR paper, 1M iteration with batch size 8 produced good samples.

@jychoi118 Thanks for your explanation for the number of images seen during training! However, I wonder if the iteration you said in batch_size x iteration is equal to the step in log information when runs the training code (as shown as the picture).