elphinkuo / fgml

Acceleration library for Machine Learning, especially for large language models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fmgl

Acceleration library for Machine Learning, especially for large language models.

  • Uniform quantization of LLama2 model, without block grouping.
  • Uniform quantization of Llama2 model, support 64 * 64 block grouping.
  • Non Uniform Dense and Sparse quantization of LLAMA2 (3bit, 4bit), based on the Hessian information.
  • Inference Dense & Sparse 3bit, 4bit LLAMA2-7B.

About

Acceleration library for Machine Learning, especially for large language models

License:Apache License 2.0