Difference between paper and code
valencebond opened this issue · comments
mixed_feature = 2 * torch.cat((l * feature_a, (1 - l) * feature_b), dim=1)
output = model(mixed_feature, classifier_flag=True)
according to the code, the target introduced in section 4.3 may can not achieve, as the feature is concatenated followed by only one classifer.
Cumulative learning strategy is proposed to shift the learning focus between the bilateral branches by controlling the weights for features produced by two branches and the classification loss L.
would you mind telling me the reason behind this change?
thanks~ that means manifold mixup by concat is a better way than by originally sum ?what‘s more,why the scale 2 is needed?
mixed_feature = 2 * torch.cat((l * feature_a, (1 - l) * feature_b), dim=1)
Scale 2 is to ensure the grad consistent with the default combiner. (e.g. two samplers sample the same picture, and l = 0.5)
thanks~ that means manifold mixup by concat is a better way than by originally sum ?what‘s more,why the scale 2 is needed?
mixed_feature = 2 * torch.cat((l * feature_a, (1 - l) * feature_b), dim=1)
Hi @valencebond @ZhouBoyan Is this equivalent to that mentioned in the paper or concat method is better performing than original sum.
Thanks