Using Resnet_fusion_noise model.

Question

Using Resnet_fusion_noise model.

limosin opened this issue 4 years ago · comments

@pengzhou1108
I was trying to use the resnet_fusion_noise model to include the noise network as well. However I am facing Nan in summary histogram error after about 160 iterations. These are the training logs that come up on terminal:

I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 596 to 655
iter: 40 / 30000, total loss: nan
 >>> rpn_loss_cls: nan
 >>> rpn_loss_box: nan
 >>> loss_cls: 0.439838
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 8.585s / iter

What am I doing wrong? Am I using the wrong network?
All I did was change res101_fusion to res101_fusion_noise inside the train_dis_faster.sh script. Is there anything else I need to do?
please help. thank you.

pengzhou1108 · Answer 1 · Fri Aug 14 2020 04:02:22 GMT+0800 (China Standard Time)

@limosin
Hi, 'res101_fusion' already includes the noise network. You can use the default setting for training. Also, it seems that the training reached your pooling size limitation. Please check the gpu memory and reduce rpn batch size accordingly.

Somil Singhai · Answer 2 · Fri Aug 14 2020 15:51:59 GMT+0800 (China Standard Time)

Thanks a lot!! That solves my concern..