OpenNMT / CTranslate2

Fast inference engine for Transformer models

Home Page:https://opennmt.net/CTranslate2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

int8 quantization not working

homink opened this issue · comments

Hi, it looks like int8 quantization was configured int8_float32 mistakenly for a long time. Could anyone please have a look at it?

https://github.com/OpenNMT/CTranslate2/blob/b6daa04795138e823c45d5f99baf7b3426f01bfd/python/cpp/module.cc#L36C27-L36C43

I think it should be

From:

  if (support_int8) {
    compute_types.emplace("int8");
    compute_types.emplace("int8_float32");

    if (support_float16)
      compute_types.emplace("int8_float16");
    if (support_bfloat16)
      compute_types.emplace("int8_bfloat16");
  }

To:

  if (support_int8)
    compute_types.emplace("int8");
  if (support_float16)
    compute_types.emplace("int8_float16");
  if (support_bfloat16)
    compute_types.emplace("int8_bfloat16");

I guess this was done on purpose..

case ComputeType::INT8: {