There is a convolution layer with kernel_size=1 and stride=2.
divyanshj16 opened this issue · comments
Divyansh Jha commented
Fisher Yu commented
The quoted code snippet doesn't show stride=2. For some part of the network, the downsampling layers are only for subsampled residual connection. The full information is used in the 3x3 layers, while some information might be lost in the residual connections. This design follows the original residual networks for direct comparison.