Using Resnet_fusion_noise model.
limosin opened this issue · comments
@pengzhou1108
I was trying to use the resnet_fusion_noise model to include the noise network as well. However I am facing Nan in summary histogram
error after about 160 iterations. These are the training logs that come up on terminal:
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 596 to 655
iter: 40 / 30000, total loss: nan
>>> rpn_loss_cls: nan
>>> rpn_loss_box: nan
>>> loss_cls: 0.439838
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 8.585s / iter
What am I doing wrong? Am I using the wrong network?
All I did was change res101_fusion
to res101_fusion_noise
inside the train_dis_faster.sh
script. Is there anything else I need to do?
please help. thank you.
@limosin
Hi, 'res101_fusion' already includes the noise network. You can use the default setting for training. Also, it seems that the training reached your pooling size limitation. Please check the gpu memory and reduce rpn batch size accordingly.
Thanks a lot!! That solves my concern..