SCEdit not converge after 2w iterations on COCO dataset.
serend1p1ty opened this issue · comments
Sorry for the inconvenience this may have caused you. The problem could be due to the learning rate being set too low, which might be affecting convergence. Our framework employs a batch learning rate to accommodate multi-GPU scenarios, which is calculated as "real_lr = yaml_lr * gpu_num * batch / 640". This can also be observed in the log entries under "pg0_lr: 0.xxx". We plan to make adjustments to ensure compatibility with this learning rate setting in the future.
For more information, please see: https://github.com/modelscope/scepter/blob/main/scepter/modules/solver/diffusion_solver.py#L136
@jiangzeyinzi Thanks for your reply. Can you tell me approximately what the final loss value is.
Current value is 0.37. Is it normal?
In generative tasks, loss often cannot serve as the central basis for model convergence. In our setup, the loss went from 0.16 to 0.14. Additionally, the loss also shows apparent differences with different data, base models, and condition types. From the perspective of results, under the settings in our paper (with a larger batch size), running about 3k steps generally leads to generated results that are constrained by the conditional images.
Understood, I'll train a little more and see the results.
@jiangzeyinzi
After training 5w steps, the model seems converge.
For humans, generation seems to be not very stable.
I am a beginner in the field of text-to-image generation. Does the current result meet expectations?
Next, I plan to train SCEdit on the larger LAION dataset. Your paper mentions that you used a fixed learning rate 5e-5
.
In order to reproduce the results in the paper, should I set the learning rate to 0.000125
if batch_size is 256
? Because 0.000125*256/640=5e-5
.
Looking forward to your reply.
I believe it meets the expectations, as the COCO dataset contains a large number of images with multiple subjects and small faces, which are significant challenges in generation tasks. Training with premium data is a good approach. Additionally, a reasonable learning rate in experiments will not cause a particularly large effect deviation. It is recommended to set it at 5e-5 without having to follow the batch learning rate setting. I hope you achieve good results.
Thanks for you reply.