google / XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support int8 transposed convolutions with per-channel weight quantization

lgeiger opened this issue · comments

TFLite uses int8 per-channel weight quantization for transposed convolutions.
While XNNPACK includes a fast transposed convolution operation it only supports per-tensor weight quantization (i.e. a single quantisation scale for the weight tensor), which means transposed convolutions in a TFLite QAT int8 model are currently not supported by XNNPACK and won't be accelerated.

It would be excellent if XNNPACK would add support for per-channel quantized weights to the transposed convolution op in the future to match the behaviour of the normal convolution.

Does TFLite support TRANSPOSE_CONV with per-channel quantization? Last time I looked at it, it wasn't supported there, thus I didn't implement it in XNNPACK.

Does TFLite support TRANSPOSE_CONV with per-channel quantization?

Yes it does. TRANSPOSE_CONV kernels supports both but internally it will actually convert it to per-channel in both cases using this code path.

The MLIR converter will also always output per-channel quantized weights as far as I can tell.

Does TFLite support TRANSPOSE_CONV with per-channel quantization?

@Maratyszcza I double check this again, and indeed TFLite supports per-channel quantization and the current converter will always generate per-channel quantized transposed convolutions when using QAT.