pengzhou1108 / RGB-N

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using Resnet_fusion_noise model.

limosin opened this issue · comments

@pengzhou1108
I was trying to use the resnet_fusion_noise model to include the noise network as well. However I am facing Nan in summary histogram error after about 160 iterations. These are the training logs that come up on terminal:

I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 596 to 655
iter: 40 / 30000, total loss: nan
 >>> rpn_loss_cls: nan
 >>> rpn_loss_box: nan
 >>> loss_cls: 0.439838
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 8.585s / iter

What am I doing wrong? Am I using the wrong network?
All I did was change res101_fusion to res101_fusion_noise inside the train_dis_faster.sh script. Is there anything else I need to do?
please help. thank you.

@limosin
Hi, 'res101_fusion' already includes the noise network. You can use the default setting for training. Also, it seems that the training reached your pooling size limitation. Please check the gpu memory and reduce rpn batch size accordingly.

Thanks a lot!! That solves my concern..