OpenNMT / CTranslate2

Hi, it looks like int8 quantization was configured int8_float32 mistakenly for a long time. Could anyone please have a look at it?

https://github.com/OpenNMT/CTranslate2/blob/b6daa04795138e823c45d5f99baf7b3426f01bfd/python/cpp/module.cc#L36C27-L36C43

I think it should be

From:

  if (support_int8) {
    compute_types.emplace("int8");
    compute_types.emplace("int8_float32");

    if (support_float16)
      compute_types.emplace("int8_float16");
    if (support_bfloat16)
      compute_types.emplace("int8_bfloat16");
  }

To:

  if (support_int8)
    compute_types.emplace("int8");
  if (support_float16)
    compute_types.emplace("int8_float16");
  if (support_bfloat16)
    compute_types.emplace("int8_bfloat16");

I guess this was done on purpose..

CTranslate2/src/types.cc

Line 200 in 8994330

case ComputeType::INT8: {

int8 quantization not working