Some questions about model initialization

Question

Some questions about model initialization

wky1998 opened this issue 5 years ago · comments

Thanks for your open source code
But I had some problems initializing the model

File "/home/wky/3d-face-anti-spoofing/code/CVPR2019-MADDoG-master/misc/utils.py", line 48, in weights_init_xavier
init.uniform_(m.weight.data, 1.0, 0.02)
File "/home/wky/virtual/face-anti-spoofing/lib/python3.5/site-packages/torch/nn/init.py", line 88, in uniform_
return no_grad_uniform(tensor, a, b)
File "/home/wky/virtual/face-anti-spoofing/lib/python3.5/site-packages/torch/nn/init.py", line 14, in no_grad_uniform
return tensor.uniform_(a, b)
RuntimeError: Expected a_in <= b_in to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

When I exchange 0.02 and 1, it works fine, but I'm not sure if it will affect the result.
Another problem is that kaiming initialization is more suitable for the RELU activation function, but you use Xavier initialization in your code, could you please tell me why you chose it?
Thanks
Looking forward to your reply

Rui Shao · Answer 1 · Sun Dec 29 2019 20:30:23 GMT+0800 (China Standard Time)

Hi,

Our codes are based on PyTorch 0.4.0. Different versions of initialization may affect the results.

Xavier is the most widely used initialization way.

Thanks.

Keyuwu · Answer 2 · Mon Dec 30 2019 10:37:06 GMT+0800 (China Standard Time)

Thanks for you reply.
I have another question, How much data you use for training, which means how many frames of a video are used for training(or pretrain). When i was trainning, loss_critic1、 loss_critic2 、loss_critic3 and Loss_triplet always going to 0, but loss_generator(1,2,3) always approaching 1. Is this because the generator always learns more slowly than the discriminator. How can I avoid this. I've tried to modify the learning rate, or the generator learning 5 times and the discriminator learning once, but the result is similar.
Thanks.

sherlockers · Answer 3 · Sun Apr 04 2021 20:25:35 GMT+0800 (China Standard Time)

Thanks for you reply.
I have another question, How much data you use for training, which means how many frames of a video are used for training(or pretrain). When i was trainning, loss_critic1、 loss_critic2 、loss_critic3 and Loss_triplet always going to 0, but loss_generator(1,2,3) always approaching 1. Is this because the generator always learns more slowly than the discriminator. How can I avoid this. I've tried to modify the learning rate, or the generator learning 5 times and the discriminator learning once, but the result is similar.
Thanks.
Hello, have you solved this problem? I also encountered the same situation.Can you give some suggestions？