arrayfire / arrayfire

ArrayFire: a general purpose GPU library.

https://arrayfire.com

Quantized matmul

WilliamTambellini opened this issue a year ago · comments

William Tambellini commented a year ago

Add a new algorithm to do quantized/dequantized matmul: eg:
https://oneapi-src.github.io/oneDNN/page_cpu_matmul_quantization_cpp.html#doxid-cpu-matmul-quantization-cpp
https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/quantized-mat-mul
https://github.com/cmp-nct/ggllm.cpp/blob/master/ggml-cuda.cu

Description

What problem are you trying to solve?
Fast matmul at reduced int precision
(Optional) API of new function
af::qmatmul(left, right)
(Optional) Algorithms that could be used to implement this feature
std dynamic or static quantization and dequantization
(Optional)Are there other libraries that implement this feature?
many: onednn, ggml, ...