ggerganov / ggml

Tensor library for machine learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible typo in comment

antirez opened this issue · comments

Hi, in the Q2_K structure there is a comment stating that each weight uses 2.5625 bits:

// 2-bit quantization
// weight is represented as x = a * q + b
// 16 blocks of 16 elements each
// Effectively 2.5625 bits per weight
typedef struct {
    uint8_t scales[QK_K/16]; // scales and mins, quantized with 4 bits
    uint8_t qs[QK_K/4];      // quants
    ggml_fp16_t d;           // super-block scale for quantized scales
    ggml_fp16_t dmin;        // super-block scale for quantized mins
} block_q2_K;

But if I do the math, I obtain:

block size = 16 + 64 + 4 = 84 bytes, that is 672 bits
bits per weight = 672/256 = 2.625

Cheers

Yup, it'a typo - fixed