arrayfire / arrayfire

ArrayFire: a general purpose GPU library.

Home Page:https://arrayfire.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Quantized matmul

WilliamTambellini opened this issue · comments

Add a new algorithm to do quantized/dequantized matmul: eg:
https://oneapi-src.github.io/oneDNN/page_cpu_matmul_quantization_cpp.html#doxid-cpu-matmul-quantization-cpp
https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/quantized-mat-mul
https://github.com/cmp-nct/ggllm.cpp/blob/master/ggml-cuda.cu

Description

  • What problem are you trying to solve?
    Fast matmul at reduced int precision
  • (Optional) API of new function
    af::qmatmul(left, right)
  • (Optional) Algorithms that could be used to implement this feature
    std dynamic or static quantization and dequantization
  • (Optional)Are there other libraries that implement this feature?
    many: onednn, ggml, ...