Quantized matmul
WilliamTambellini opened this issue · comments
William Tambellini commented
Add a new algorithm to do quantized/dequantized matmul: eg:
https://oneapi-src.github.io/oneDNN/page_cpu_matmul_quantization_cpp.html#doxid-cpu-matmul-quantization-cpp
https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/quantized-mat-mul
https://github.com/cmp-nct/ggllm.cpp/blob/master/ggml-cuda.cu
Description
- What problem are you trying to solve?
Fast matmul at reduced int precision - (Optional) API of new function
af::qmatmul(left, right) - (Optional) Algorithms that could be used to implement this feature
std dynamic or static quantization and dequantization - (Optional)Are there other libraries that implement this feature?
many: onednn, ggml, ...