mahyarnajibi / SSH

SSH: Single Stage Headless Face Detector

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature map fusion without channel reduction?

opened this issue · comments

Hi @po0ya and @mahyarnajibi ,

Firstly, thank you for the discussion in my previous post earlier! I definitely benefited from learning from you!

I have a question regarding channel reduction and would be glad if you can share your suggestions. In the paper, it was written, " decrease the memory consumption of the model, the number of channels in the feature map is reduced from 512 to 128 using 1 x 1 convolutions."
I am trying to run an experiment when only feature map fusion is done (without reducing channels). However, by doing so, I encountered exploded gradient. Of course, one possible direction is to reduce learning rate but I try not to do that yet as I suspect I may have implemented prototxt incorrectly or I might have missed out something that I did not realize... may I know if nan loss was encountered too in your experiment when channels are not reduced?

Iteration 20 (1.09985 iter/s, 18.1843s/20 iters), loss = nan
Train net output #0: m1@ssh_cls_loss = 87.3365 (* 1 = 87.3365 loss)
Train net output #1: m1@ssh_reg_loss = nan (* 1 = nan loss)
Train net output #2: m2@ssh_cls_loss = 87.3365 (* 1 = 87.3365 loss)
Train net output #3: m2@ssh_reg_loss = nan (* 1 = nan loss)
Train net output #4: m3@ssh_cls_loss = 87.3365 (* 1 = 87.3365 loss)
Train net output #5: m3@ssh_reg_loss = nan (* 1 = nan loss)

Prototxt modification is as follows:

#==========CONV4 Backwards for M1======

# Upsample conv5_3
layer {
  name: "conv5_3_up"
  type: "Deconvolution"
  bottom: "conv5_3"
  top: "conv5_3_up"
  convolution_param {
    kernel_size: 4 
    stride: 2
    num_output: 512
    group: 512
    pad: 1
    weight_filler: { type: "bilinear" } 
    bias_term: false
  param { lr_mult: 0 decay_mult: 0 }

# Crop conv5_3
layer {
  name: "conv5_3_crop"
  type: "Crop"
  bottom: "conv5_3_up"
  bottom: "conv4_3"
  top: "conv5_3_crop"
  crop_param {
    axis: 2
    offset: 0

# Eltwise summation
layer {
  name: "conv4_fuse"
  type: "Eltwise"
  bottom: "conv5_3_crop"
  bottom: "conv4_3"
  top: "conv4_fuse"
  eltwise_param {
    operation: SUM
# Perform final 3x3 convolution
layer {
  name: "conv4_fuse_final"
  type: "Convolution"
  bottom: "conv4_fuse"
  top: "conv4_fuse_final"
  param {
    lr_mult: 1
  param {
    lr_mult: 2
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    weight_filler { type: "gaussian" std: 0.01 }
    bias_filler { type: "constant" value: 0 }
layer {
  name: "conv4_fuse_final_relu"
  type: "ReLU"
  bottom: "conv4_fuse_final"
  top: "conv4_fuse_final"

I have the same error during training SSH on a new dataset, the loss is 87 and nan, do you have any experiences? I have tried to change nms and threshold of positive samples as 0.8. The loss is changed and decreased, but result of testing the trained model is bad.

@xiaofanglegoc, hi, is there any modification to the network other than using another dataset? You can try gradient clipping too as mentioned by @po0ya. That solved the problem I faced. :)

@loackerc I have not changed the network, just change the lib/dataset/ to my new dataset, and I have organized the new dataset in PASCAL VOC feature. The network take input from

@po0ya could you please indicate more details on the gradient clipping? Thanks