Difference between paper and code

Question

Difference between paper and code

valencebond opened this issue 5 years ago · comments

the corresponding code is

mixed_feature = 2 * torch.cat((l * feature_a, (1 - l) * feature_b), dim=1)
output = model(mixed_feature, classifier_flag=True)

according to the code, the target introduced in section 4.3 may can not achieve, as the feature is concatenated followed by only one classifer.

Cumulative learning strategy is proposed to shift the learning focus between the bilateral branches by controlling the weights for features produced by two branches and the classiﬁcation loss L.

would you mind telling me the reason behind this change?

Boyan Zhou · Answer 1 · Fri Dec 27 2019 19:10:37 GMT+0800 (China Standard Time)

Actually, two fully connected layer could be merged into one for simplicity.
Please refer to the formula below:

Jian · Answer 2 · Fri Dec 27 2019 19:32:34 GMT+0800 (China Standard Time)

thanks~ that means manifold mixup by concat is a better way than by originally sum ？what‘s more，why the scale 2 is needed？

mixed_feature = 2 * torch.cat((l * feature_a, (1 - l) * feature_b), dim=1)

Boyan Zhou · Answer 3 · Fri Jan 10 2020 12:20:34 GMT+0800 (China Standard Time)

Scale 2 is to ensure the grad consistent with the default combiner. (e.g. two samplers sample the same picture, and l = 0.5)

Nisarg Shah · Answer 4 · Thu Aug 27 2020 01:13:06 GMT+0800 (China Standard Time)

thanks~ that means manifold mixup by concat is a better way than by originally sum ？what‘s more，why the scale 2 is needed？

mixed_feature = 2 * torch.cat((l * feature_a, (1 - l) * feature_b), dim=1)

Hi @valencebond @ZhouBoyan Is this equivalent to that mentioned in the paper or concat method is better performing than original sum.

Thanks