How to perform the arithmetic operation?
jeff830107 opened this issue · comments
I am wondering how it performs the arithmetic operation for a quantized value.
That's say, 10-bit for input, 6-bit for weight, how does it perform a 10-bit x 6-bit multiplication?
Or maybe it doesn't do the "real" arithmetic operation?
(Simulate the value being quantized to 10-bit or 6-bit and dequantize them back to 16-bit float or 8-bit int, and use the 16-bit float or 8-bit int operation as usual?)
Thanks for your help!!
Yes, you are right. It represents the 10-bit and 6-bit values with float values, and performs multiplication in float domain. There are still gaps between the simulation and real low-bit computation.