About test prototxt and choosing the loss_weight

Question

About test prototxt and choosing the loss_weight

mjohn123 opened this issue 7 years ago · comments

mjohn123 commented 7 years ago

Thanks for sharing your great work!!

The Fig. 2 shows the learning curves of the 3D DSN. In which, you shows the 3D DSN validation loss bigger than 3D DSN training. Do you use loss_weight (auxiliary classifier) for the validation .prototxt? In my knowledge, the loss_weight only uses in training .prototxt.
One more thing, how do you choose the loss weight value for each auxiliary classifier? In your work "Volumetric-ConvNet-for-prostate-segmentation", the loss_weight is smaller than 1 but it is bigger than 1 in here.

Lequan Yu · Answer 1 · Sun May 14 2017 23:00:52 GMT+0800 (China Standard Time)

The train loss and validation loss in Fig. 2 both are the main classifier loss.
We should care the relative weight value (the ratio of different weights) of main classifier and auxiliary classifier. The following two cases are the same : (1) learning rate is 0.01, loss weight is 0.1:0.2:0.5 (2) learning rate is 0.001, loss weight is 1:2:5
As for the specific value of the loss weight. We choose them empirically according to the principle: The main classifier should have large value than auxiliary classifiers.

mjohn123 · Answer 2 · Mon May 15 2017 00:08:21 GMT+0800 (China Standard Time)

Thanks. Is the main classifer displayed in Iteration x, loss or Train net output #0: loss as the log bellow?. As you said, it is main classifer loss, so I guess it is Train net output #0: loss, but caffe uses Iteration x, loss to calculate gradient. Am I right?

..solver.cpp:228] Iteration 0, loss = 0.7
...solver.cpp:244]     Train net output #0: loss = xx (* 5 = xxloss)
...solver.cpp:244]     Train net output #1: loss1 = xx (* 2 = xx loss)
...solver.cpp:244]     Train net output #2: loss2 = xx (* 3 = xx loss)

Second, how about between auxilirary classifiers ? The loss weight of auxiliary classifiers in deeper layer (i.e layer 15th) is bigger than in deep layer ( i.e. layer 8th). Is it right? In my experiments, the loss of auxiliary classifiers in deeper layer may larger than in deep layer, hence bigger loss weight in that case may be wrong. Right?

Lequan Yu · Answer 3 · Mon May 15 2017 00:23:43 GMT+0800 (China Standard Time)

The main classifier is the "Train net output #0: loss". We use the total loss (main classifier + auxiliary classifier) as the objective function, this is like the caffe. As you said, we do not use loss weight in validation. Therefore, we only plot the main classifier loss of train and validation for meaningful comparison.

The loss weight of auxiliary classifiers in deeper layer(15th) is bigger than that in shallower layer (8th). The main classifier has the largest value, because you can regard the main classifier as the auxiliary classifier in the last layer.