cannot reproduce comparable results in the paper

Question

cannot reproduce comparable results in the paper

felix-yuxiang opened this issue a year ago · comments

Hi, I ran the code with a single GPU NVIDIA GeForce RTX 3090 with the given config file listed in the paper. Here is my reproduced result which is significantly different from the results provided in README.md file. Can you guide me through and specify what could be the issue? Can you provide more info on how to train a model with the same performance of your pre-trained model you provided in /checkpoints. Any help will be appreciated.

pranav singh chib · Answer 1 · Thu Jul 20 2023 15:28:38 GMT+0800 (China Standard Time)

I also have the similar result while reproducing the training.

fangzl123 · Answer 2 · Wed Oct 25 2023 21:04:08 GMT+0800 (China Standard Time)

I think it's due to the hyperparameters setting. In the paper it's mentioned "With a frozen denoising module, we then train the leapfrog initializer for 200 epochs with an initial learning rate of 10−4, decaying by 0.9 every 32 epochs", but in the default led_augment.yml it is not like this.

Set them based on the paper and I've got

Frank Star · Answer 3 · Tue Nov 07 2023 11:31:33 GMT+0800 (China Standard Time)

I think it's due to the hyperparameters setting. In the paper it's mentioned "With a frozen denoising module, we then train the leapfrog initializer for 200 epochs with an initial learning rate of 10−4, decaying by 0.9 every 32 epochs", but in the default led_augment.yml it is not like this.

Set them based on the paper and I've got

Hello, did you use the pre trained model provided by him for the diffusion model in the first stage, or did you train yourself for one stage according to the settings in the paper?

fangzl123 · Answer 4 · Tue Nov 14 2023 15:31:28 GMT+0800 (China Standard Time)

I think it's due to the hyperparameters setting. In the paper it's mentioned "With a frozen denoising module, we then train the leapfrog initializer for 200 epochs with an initial learning rate of 10−4, decaying by 0.9 every 32 epochs", but in the default led_augment.yml it is not like this.
Set them based on the paper and I've got

Hello, did you use the pre trained model provided by him for the diffusion model in the first stage, or did you train yourself for one stage according to the settings in the paper?

Hi, I use the provided pre-trained model as the first stage.

fangzl123 · Answer 5 · Tue Nov 28 2023 18:43:25 GMT+0800 (China Standard Time)

Hi, I double checked my training configuration, and found that my learning rate setting is 1e-3 (same with the default code setting). So, I think you can try once with lr=1e-3 decaying by 0.9 every 32 epochs? The hyperparameter settings in this repo are quite confusing, some parts are consistent with the paper but some are not... Best,

…

On Wed, 22 Nov 2023 at 19:08, Frank Star ***@***.***> wrote: I think it's due to the hyperparameters setting. In the paper it's mentioned "With a frozen denoising module, we then train the leapfrog initializer for 200 epochs with an initial learning rate of 10−4, decaying by 0.9 every 32 epochs", but in the default led_augment.yml it is not like this. Set them based on the paper and I've got [image: image] <https://user-images.githubusercontent.com/58146879/278015992-f6145560-0ebe-441c-8bd2-846dff8b996d.png> Hello, did you use the pre trained model provided by him for the diffusion model in the first stage, or did you train yourself for one stage according to the settings in the paper? Hi, I use the provided pre-trained model as the first stage. Hello, I used the same hyperparameter settings as in the paper (200 epochs with an initial learning rate of 10−4, decaying by 0.9 every 32 epochs) on the RTX4090 server, but got [image: 1111] <https://user-images.githubusercontent.com/72490620/284882130-3917c2a5-48a6-475b-ab3e-04160f66aaa1.png> This result is much lower than the results in the paper, and there is a slight improvement after increasing the epoch to 400, but it is still not as good as the results in the paper [image: 400epo] <https://user-images.githubusercontent.com/72490620/284882634-6a129d5e-c8de-412c-a4f8-e57f49fa9fb5.png> Do you have any clue about this? What do you think is the reason for the inability to reproduce? — Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AN3UAP72PZKJLV4TCMVIEX3YFXML3AVCNFSM6AAAAAA2O3BXUGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRSGU3DIMBZGU> . You are receiving this because you commented.Message ID: ***@***.***>

pranav singh chib · Answer 6 · Tue Nov 28 2023 21:57:44 GMT+0800 (China Standard Time)

I think 0.83/1.69 was the only reproduced result

Felix Fu · Answer 7 · Thu Mar 28 2024 03:10:37 GMT+0800 (China Standard Time)

Now, I am able to reproduce their stageone and LED stagetwo results. The answer from @woyoudian2gou helped me a lot. But I would say it requires an non-trivial amount of engineering work to tune this well.

pranav singh chib · Answer 8 · Thu Mar 28 2024 03:15:33 GMT+0800 (China Standard Time)

Could you share with us some insight, it would me helpful.

kkk00714 · Answer 9 · Thu Mar 28 2024 04:47:23 GMT+0800 (China Standard Time)

Now, I am able to reproduce their stageone and LED stagetwo results. The answer from @woyoudian2gou helped me a lot. But I would say it requires an non-trivial amount of engineering work to tune this well.

Yes, and the whole implementation is difficult to explain, I think the original
author may have used a different method to get the pre-trained model.

Happiness comes after suffering · Answer 10 · Wed May 15 2024 16:51:35 GMT+0800 (China Standard Time)

@woyoudian2gou Hi, I have implemented your mentioned hyperparameters setting, but still can't get a reasonable result. So could you share your config.yml with us? Thank you very much.

kkk00714 · Answer 11 · Wed May 15 2024 20:11:13 GMT+0800 (China Standard Time)

@woyoudian2gou Hi, I have implemented your mentioned hyperparameters setting, but still can't get a reasonable result. So could you share your config.yml with us? Thank you very much.

See https://github.com/MediaBrain-SJTU/LED/issues/6

Happiness comes after suffering · Answer 12 · Thu May 16 2024 14:25:15 GMT+0800 (China Standard Time)

@kkk00714 Thank you for your prompt reply, I would also like to know the hyperparameters of Phase 2 training, could you share that? I would appreciate it

kkk00714 · Answer 13 · Thu May 16 2024 23:22:56 GMT+0800 (China Standard Time)

@kkk00714 Thank you for your prompt reply, I would also like to know the hyperparameters of Phase 2 training, could you share that? I would appreciate it

The hyperparameters of stage 2 are same as original implement of author (batchsiaze = 10, lr = 1e-4...).