Weight (lambda) value for image level adaptation and hyperparameter request

Question

Weight (lambda) value for image level adaptation and hyperparameter request

sehyun03 opened this issue 5 years ago · comments

Hi I found weight value for image level adaptation loss on "train.prototext" set to 1.0, which is not consistent with your paper(all lambda set to 0.1).

layer {
  name: "da_conv_loss"
  type: "SoftmaxWithLoss"
  bottom: "da_score_ss"
  bottom: "da_label_ss_resize"
  top: "da_conv_loss"
  loss_param {
    ignore_label: 255
    normalize: 1
  }
  propagate_down: 1
  propagate_down: 0
  loss_weight: 1
}

Also "lr_mult" for instance level domain classifier have 10 times more value than other conv of fc.

layer {
  name: "dc_ip3"
  type: "InnerProduct"
  bottom: "dc_ip2"
  top: "dc_ip3"
  param {
    lr_mult: 10
  }
  param {
    lr_mult: 20
  }
  inner_product_param {
    num_output: 1
    weight_filler {
      type: "gaussian"
      # std: 0.3
      std: 0.05
    }
    bias_filler {
      type: "constant"
    }
  }
}

Can you provide exact hyperparameters on "loss_weight", "lr_mult", "gradient_scaler_param" you used on your paper?
It would be appriciated to get the hypereparameters for each setting(image level DA, image + instance level DA, image + instance level DA + consistency loss) and dataset(sim10k->cityscapes, cityscapes->citysacpes_foggy, kitty <-> cityscapes). Thank you.

JeromeMutgeert · Answer 1 · Tue Feb 19 2019 22:07:20 GMT+0800 (China Standard Time)

Hi,

I am trying to get familliar with the code too. I came to similar questions. I think you can find your lambda in the GradientScaler layers that implement the GRL's. They scale the gradient with a factor -0.1, which effectively results in the right gradient, at least for the FRCNN-part of the network. The DA-part has a loss of (L_img + L_ins + L_cst), without minius, and without factor lambda. I think this is desirable for the training of the adversarial (DA) part.

However, what I just said is not consistent with the rest of the code, because the L_ins has a gradient scaling factor in the GRL of -0.1, ánd a loss weight of 0.1 at the dc_loss output. I think the latter factor 0.1 is cancelled out by the learning rate multipliers in the corresponding layers in between. But anyway, when its gradient is mixed with the FRCNN loss, it thus seems to only be weighed in with a factor 0.01.

About the L_cst, I have not found any code for that in this repository. I think you will need to use the caffe2 implementation for that, see #4