megvii-research / BBN

The official PyTorch implementation of paper BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition

Home Page:https://arxiv.org/abs/1912.02413

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Difference between paper and code

valencebond opened this issue · comments

commented

Screen Shot 2019-12-27 at 5 14 59 PM

the corresponding code is

mixed_feature = 2 * torch.cat((l * feature_a, (1 - l) * feature_b), dim=1)
output = model(mixed_feature, classifier_flag=True)

according to the code, the target introduced in section 4.3 may can not achieve, as the feature is concatenated followed by only one classifer.

Cumulative learning strategy is proposed to shift the learning focus between the bilateral branches by controlling the weights for features produced by two branches and the classification loss L.

would you mind telling me the reason behind this change?

Actually, two fully connected layer could be merged into one for simplicity.
Please refer to the formula below:
cal

commented

thanks~ that means manifold mixup by concat is a better way than by originally sum ?what‘s more,why the scale 2 is needed?

mixed_feature = 2 * torch.cat((l * feature_a, (1 - l) * feature_b), dim=1)

Scale 2 is to ensure the grad consistent with the default combiner. (e.g. two samplers sample the same picture, and l = 0.5)

thanks~ that means manifold mixup by concat is a better way than by originally sum ?what‘s more,why the scale 2 is needed?

mixed_feature = 2 * torch.cat((l * feature_a, (1 - l) * feature_b), dim=1)

Hi @valencebond @ZhouBoyan Is this equivalent to that mentioned in the paper or concat method is better performing than original sum.

Thanks