elphinkuo / fgml

Acceleration library for Machine Learning, especially for large language models

fmgl

Acceleration library for Machine Learning, especially for large language models.

Uniform quantization of LLama2 model, without block grouping.
Uniform quantization of Llama2 model, support 64 * 64 block grouping.
Non Uniform Dense and Sparse quantization of LLAMA2 (3bit, 4bit), based on the Hessian information.
Inference Dense & Sparse 3bit, 4bit LLAMA2-7B.

Acceleration library for Machine Learning, especially for large language models

Apache License 2.0