google / gemmlowp

Low-precision matrix multiplication

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

int8*int8 -> float?

XapaJIaMnu opened this issue · comments

Hey,

I'm looking to perform int8 * int8 -> fp32. where at the output stage I dequantise the int32_t result into float (and then potentially add a bias. I was following the example from https://github.com/google/gemmlowp/blob/master/doc/quantization_example.cc#L305
But it seems that in order to unquantise to float you compute the quantisation parameters from the fp32 result that you had already computed before, which in practise I wouldn't know. I can compute it with a compensation factor, but it becomes incredibly complicated and computationally (and memory) expensive. Any alternatives?

If I am able to assume quantisation into int8 as opposed to uint8 as in the example, I would be able to have quantisation without the zero_point parameter (assuming zero cantered distribution) which would massively simplify dequantisation. Do you support this? Do you have any examples in the codebase where something like this is done?

For such use cases, we typically have the matmul output raw int32 accumulators, then we do a pass outside of the matmul library converting those to float.

In gemmlowp, you get raw int32 accumulators simply by passing an empty output_pipeline, as in this part of the test:

gemmlowp/test/test.cc

Lines 1211 to 1230 in fda83bd

// Test an empty pipeline, i.e. returning raw int32 accumulators.
auto empty_pipeline = std::make_tuple();
GemmContext context;
GemmWithOutputPipeline<std::uint8_t, std::int32_t, DefaultL8R8BitDepthParams>(
&context, lhs.const_map(), rhs.const_map(), &result_raw_int32, lhs_offset,
rhs_offset, empty_pipeline);
for (int r = 0; r < rows; r++) {
for (int c = 0; c < cols; c++) {
std::int32_t expected = 0;
for (int d = 0; d < depth; d++) {
std::int32_t lhs_val =
static_cast<std::int32_t>(lhs(r, d)) + lhs_offset;
std::int32_t rhs_val =
static_cast<std::int32_t>(rhs(d, c)) + rhs_offset;
expected += lhs_val * rhs_val;
}
Check(expected == result_raw_int32(r, c));
}
}

May I suggest taking a look at the ruy library instead of gemmlowp. It's basically gemmlowp's successor, it's what TFLite has been using by default on ARM for 18 months now, it supports both float and quantized, any combination of int8 and uint8, zero point or not and more quantization flavor variations. I've added an example for getting raw int32 accumulators.
https://github.com/google/ruy/blob/878283640de7946a43053e8ebf4f15114fbc9156/example/example.cc#L129-L152

@bjacob thank you that will do nicely. I think I'll use RUY.

Looking at the test, as far as I can see, only i8_i8_i32_i32 is supported, no i8_i8_i32_f32, so I'd have to do the float conversion outside of the multiply, correct?

Yes, exactly.