Deconvolutional Layer is slow for large kernel size
yangwenca opened this issue · comments
Deconvolutional layer is very slow for large kernel size. Kernel size is 16x16x4x2, input size is 32x32x4, output size is 256x256x2, padding is 4 (top, bottom, left, right), stride is 8 (height, width) and dilation is 1 (height width). The run time can be as slow as a few hundred ms.
This isn't surprising: you use a 16x16 kernel, which is 256X more expensive than 1x1 kernel.
@Maratyszcza would like to know whether we can use torch.nn.upsample here to substitute convtranspose2d, or is torch.nn.upsample (linear, bilinear, trilinear) supported in qnnpack?