happynear / AMSoftmax

A simple yet effective loss function for face verification.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to set the m when the feature without norm?

Allan9977 opened this issue · comments

Hi, your paper shows the result of AM-Softmax w/o FN with the m = 0.35 and 0.4.
(1).with FN : Fai = s * (cos(theta) - m) s=30, m=0.35
#prototxt
layer {
name: "fc6_l2"
type: "InnerProduct"
bottom: "norm1"
top: "fc6"
param {
lr_mult: 1
}
inner_product_param{
num_output: 10516
normalize: true
weight_filler {
type: "xavier"
}
bias_term: false
}
}
layer {
name: "label_specific_margin"
type: "LabelSpecificAdd"
bottom: "fc6"
bottom: "label"
top: "fc6_margin"
label_specific_add_param {
bias: -0.35
}
}
layer {
name: "fc6_margin_scale"
type: "Scale"
bottom: "fc6_margin"
top: "fc6_margin_scale"
param {
lr_mult: 0
decay_mult: 0
}
scale_param {
filler{
type: "constant"
value: 30
}
}
}
layer {
name: "softmax_loss"
type: "SoftmaxWithLoss"
bottom: "fc6_margin_scale"
bottom: "label"
top: "softmax_loss"
loss_weight: 1
}

(2).w/o FN : s not needed, Fai = ||x|| * cos(theta) - m, still use m = 0.35?
#prototxt
layer {
name: "fc6_l2"
type: "InnerProduct"
bottom: "norm1"
top: "fc6"
param {
lr_mult: 1
}
inner_product_param{
num_output: 10516
normalize: false
weight_filler {
type: "xavier"
}
bias_term: false
}
}
layer {
name: "label_specific_margin"
type: "LabelSpecificAdd"
bottom: "fc6"
bottom: "label"
top: "fc6_margin"
label_specific_add_param {
bias: -0.35
}
}
layer {
name: "softmax_loss"
type: "SoftmaxWithLoss"
bottom: "fc6_margin"
bottom: "label"
top: "softmax_loss"
loss_weight: 1
}

Can you show your prototxt and trainning log? thx.

When we don't use the feature normalization, it is still necessary to use annealing strategy to set the m like the lambda in SphereFace. So I need to change the code to add the annealing strategy. I will give an example tomorrow.

The example is uploaded. You also need to update the label_specific_add_layer to get the annealing codes.

https://github.com/happynear/AMSoftmax/blob/master/prototxt/face_train_test_wo_fn.prototxt

Note that I am still running the experiment to reproduce the result, so the prototxt may be changed in the next a few hours...

By the way, I can get similar results on LFW BLUFR with normalization by finetuning the network using scale 60 and margin 0.4 after iteration 16000. Maybe you can also try this way.

Thanks a lot! I will have a try.