How to quantize accumulator from int32 to uint8

Question

How to quantize accumulator from int32 to uint8

kmhatre14 opened this issue 5 years ago · comments

I am trying to implement the quantize version of MobilenNet v1 in OpenCL. I have referenced the method that you have provided in https://arxiv.org/pdf/1712.05877.pdf . I am using pretrained Mobilnet weights from the tflite file. I have got all the required quantization parameters Eg: S1 S2 and S3 from the tflite file.
The only issue is converting the accumulator back from int32 to uint8.
In the gemmlowp kernel it uses the min and max of the output tensor to quantize the accumulator from int32 to uint8, But for my implementation as I am using OpenCL, I cannot get the min and max values of the output tensor at runtime, I will have to write additional logic at the host side which will incur additional execution time.

M = (S1*S2)/S3
To quantize the accumulator currently, i am using q = ((int32 * M ) + bias)
But this output does not match with intermediate output obtained from the tensorflow lite api.

Benoit Jacob · Answer 1 · Fri Jul 19 2019 02:59:09 GMT+0800 (China Standard Time)

Our quantization scheme (that paper) assumes that the min-max range of all arrays involved, including the output activation tensor, are given. I don't know how to perform quantized inference in a way that's efficient and accurate without that datum.

Kaustubh Mhatre · Answer 2 · Mon Jul 29 2019 06:43:28 GMT+0800 (China Standard Time)

I am using the data provided by the "mobilenet_v1_1.0_224_quant.tflite"
So the tflite files provide me with the following details for the first layer.
input : -1 ≤ 0.0078125 * (q - 128) ≤ 0.9921875
weights : -3.265998125076294 ≤ 0.02182667888700962 * (q - 151) ≤ 2.2779781818389893
bias : 0.00017052092880476266 * q

so from the above data, I assume
S1 = 0.0078125
S2 = 0.02182667888700962
S3 = 0.00017052092880476266
Z1 = - 128
Z2 = - 151
Z3 = 0
Thus the Quantization Multiplier
M = (S1*S2)/S3
After Converting into fixed point multiplier becomes
int Multiplier = 1992157696
Bit shift = 7

final op =( (op(convolution int32) * Multiplier )/2^31 ) >> Shift

There is a slight difference between the op generated by the tflite code using tesorflow and the op generated by my code for the 1st layer.
Eg: myop = 161,
tensorflow lite = 163

But i think this difference amplifies and at the end i get wrong clasification

is the procedure right?

Thank you @bjacob

Michael · Answer 3 · Wed Dec 04 2019 11:36:56 GMT+0800 (China Standard Time)

@kmhatre14 have you fixed the accuracy problem ? after the matrix multiplication, the error will enlarge.

Kaustubh Mhatre · Answer 4 · Wed Dec 04 2019 23:29:11 GMT+0800 (China Standard Time)

The small difference in the output does not affect the classification. The output of our activation map is 99% similar to the activation maps generated by the Tflite. This does not affect the classification though and the confidence is also 99% same as the Tflite output.

The below code is tested and verified.
https://github.com/Ushma30/MobileNet-V1/tree/MobileNet-V1-Quantized