Support int8 transposed convolutions with per-channel weight quantization

Question

Support int8 transposed convolutions with per-channel weight quantization

lgeiger opened this issue 2 years ago · comments

TFLite uses int8 per-channel weight quantization for transposed convolutions.
While XNNPACK includes a fast transposed convolution operation it only supports per-tensor weight quantization (i.e. a single quantisation scale for the weight tensor), which means transposed convolutions in a TFLite QAT int8 model are currently not supported by XNNPACK and won't be accelerated.

It would be excellent if XNNPACK would add support for per-channel quantized weights to the transposed convolution op in the future to match the behaviour of the normal convolution.

Marat Dukhan · Answer 1 · Fri Mar 18 2022 23:13:45 GMT+0800 (China Standard Time)

Does TFLite support TRANSPOSE_CONV with per-channel quantization? Last time I looked at it, it wasn't supported there, thus I didn't implement it in XNNPACK.

Lukas Geiger · Answer 2 · Fri Mar 18 2022 23:56:06 GMT+0800 (China Standard Time)

Does TFLite support TRANSPOSE_CONV with per-channel quantization?

Yes it does. TRANSPOSE_CONV kernels supports both but internally it will actually convert it to per-channel in both cases using this code path.

The MLIR converter will also always output per-channel quantized weights as far as I can tell.

Lukas Geiger · Answer 3 · Thu May 12 2022 02:21:04 GMT+0800 (China Standard Time)

Does TFLite support TRANSPOSE_CONV with per-channel quantization?

@Maratyszcza I double check this again, and indeed TFLite supports per-channel quantization and the current converter will always generate per-channel quantized transposed convolutions when using QAT.