arthurdouillard / incremental_learning.pytorch

Hi,
In this line of your code, you use a conv with kernel_size=3, stride=1 and padding=1 as the first conv operation. This model is used for ImageNet training. However, in the original ResNet (and all of PyTorch's ResNet implementations), the first conv is fixed with kernel_size=7, stride=2, and padding=3. Is there any reason for this change? I can't seem to find it mentioned in your PODNet paper, either.

Due to the kernel size and stride in your implementation, the spatial output of the first conv layer is H0 x W0 = 224 x 224, whereas the original ResNet implementation reduces the spatial resolution to 112 x 112. Not only does this use abnormally large amounts of memory (around 3x memory of original implementation on Res18), but also requires much more computation (one training iteration is around 7x slower than original on Res18).

I've followed the resnet implem of Rebuffi

incremental_learning.pytorch/inclearn/convnet/my_resnet.py

Line 207 in 0d25c2e

    
           self.conv_1_3x3 = nn.Conv2d(channels, nf, kernel_size=3, stride=1, padding=1, bias=False)

.

I recall there was a paper on resnet for cifar that proposed using a smaller kernel size for the first conv as the image is much smaller (32x32 vs 224x224).

Why is the first conv layer different from the original ResNet?