wwlCape / HAN

PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HAN training is unstable

smiler96 opened this issue · comments

when i training your han model, i found the loss exploded and model collapsed! Have your met this?or can you give me some guidence?
image

Hi, @smiler96
I use the command provided by author(using pre-trained RCAN model) and meet the same problem.
So, you use pre-trained RCAN.pt? It seems that train the whole model from the scratch will be ok.
Have you solved this problem? the owner seems to give up this repo.

Hi, @smiler96
I use the command provided by author(using pre-trained RCAN model) and meet the same problem.
So, you use pre-trained RCAN.pt? It seems that train the whole model from the scratch will be ok.
Have you solved this problem? the owner seems to give up this repo.

No pretrained model used when I reimplemented HAN,you can find it in my github repo.
For this issue i solved it with global residual connection. I think you can try it.

thanks @smiler96 , but have you trained the HAN? how about its final result on benchmark? I merge HAN into EDSR-pytorch repo(because my GPU can't support cuda8) and the previous 20 epoch don't meet unstable problem.

HAN use long residual connection as well, I want to know what's difference between your method and the model owner provided, because It looks the same except for another long residual connection

thanks @smiler96 , but have you trained the HAN? how about its final result on benchmark? I merge HAN into EDSR-pytorch repo(because my GPU can't support cuda8) and the previous 20 epoch don't meet unstable problem.

HAN use long residual connection as well, I want to know what's difference between your method and the model owner provided, because It looks the same except for another long residual connection

I remember that the first several training epochs of the vanilla HAN were stable as the above fig showing. But I have not figured out the issue why the loss exploded.
The same HAN except the long residual connection is in my repo.
I have not calculated the PSNR values of each methods, sorry about that.

Ok, also thanks for your reply!

Hi, @smiler96
I use the command provided by author(using pre-trained RCAN model) and meet the same problem.
So, you use pre-trained RCAN.pt? It seems that train the whole model from the scratch will be ok.
Have you solved this problem? the owner seems to give up this repo.

No pretrained model used when I reimplemented HAN,you can find it in my github repo.
For this issue i solved it with global residual connection. I think you can try it.

Hi, @smiler96, I also meet the unstable in the train process. you speak a global residual connection can resolve it . I want to know the difference between your repo and the model owner provided, i can not find the specific operation in your repo.

Hi, @smiler96
I use the command provided by author(using pre-trained RCAN model) and meet the same problem.
So, you use pre-trained RCAN.pt? It seems that train the whole model from the scratch will be ok.
Have you solved this problem? the owner seems to give up this repo.

No pretrained model used when I reimplemented HAN,you can find it in my github repo.
For this issue i solved it with global residual connection. I think you can try it.

Hi, @smiler96, I also meet the unstable in the train process. you speak a global residual connection can resolve it . I want to know the difference between your repo and the model owner provided, i can not find the specific operation in your repo.

global_res=True

global_res=True

OK,thanks for your reply.

I did not faced that issue .May be i have turned gradient clipping on in 'options.py' file that's why.

I did not faced that issue .May be i have turned gradient clipping on in 'options.py' file that's why.

Hi,how did you set the '--gclip' in 'options.py'?

Hi, @smiler96 I use the command provided by author(using pre-trained RCAN model) and meet the same problem. So, you use pre-trained RCAN.pt? It seems that train the whole model from the scratch will be ok. Have you solved this problem? the owner seems to give up this repo.

Hi, I am finding pre-trained RCAN model. Could you do me a favour to tell me how find pre-trianed RCAN model(or just give me a link). Thank you.

Hi, @smiler96 I use the command provided by author(using pre-trained RCAN model) and meet the same problem. So, you use pre-trained RCAN.pt? It seems that train the whole model from the scratch will be ok. Have you solved this problem? the owner seems to give up this repo.

Hi, I am finding pre-trained RCAN model. Could you do me a favour to tell me how find pre-trianed RCAN model(or just give me a link). Thank you.

hi, you can refer to the repo https://github.com/smiler96/Image-Super-Resolution