quantize setting as the doc said but lead to "skipping *-th update due to loss being nan" for all train data input

Question

quantize setting as the doc said but lead to "skipping *-th update due to loss being nan" for all train data input

Sarah-Callies opened this issue 3 years ago · comments

Bug description

mairan version: Marian v1.10.0 6f6d484

when I set the configuration of quantize like this:
--sync-sgd --quantize-bits 8 --quantize-optimization-steps 10 --quantize-biases true

the file "marian-master\src\training\graph_group_sync.cpp"
Line 379 will occurs:
skipping *-th update due to loss being nan
because of localLoss.loss is NaN

Alham Fikri Aji · Answer 1 · Sat Apr 03 2021 12:22:58 GMT+0800 (China Standard Time)

Are you by any chance, training a quantized model from scratch?

One option is to train a normal model first, then activate the quantization.
alternatively, not using --quantize-biases true should fix the issue, and is it the recommended setting anyway.
my recommendation is both: train a model normally, then activate quantization without the bias.

also, if we only train for 8-bit I think --quantize-optimization-step is not necessary. It was designed for more extreme quantization (4-bit or less). But turning it on should be fine, though will slow down the training speed.

(see the setting: https://github.com/browsermt/students/blob/master/train-student/finetune/run.me.finetune.example.sh)

Sarah · Answer 2 · Mon Apr 05 2021 14:27:10 GMT+0800 (China Standard Time)

I adopt your advice：activate quantization by only set “--quantize-bits 16” after training one model normally，the command to start up quanization is like this:
"
nohup ./marian/build/marian -d 3 -w 12000 --model ./output2/model.npz --sync-sgd --quantize-bits 16 --train-sets ./nmt_data/en.filter15651781.bpe ./nmt_data/ja.filter15651781.bpe --vocabs ./marian/build/vocab.yml ./marian/build/vocab.yml > je.log2 &
"

and my question is that : why the model size is still the same？

Kenneth Heafield · Answer 3 · Tue Apr 06 2021 16:53:39 GMT+0800 (China Standard Time)

That makes a FP32 model that's ready to be 8-bit quantized. Next step is to binarize it.
https://github.com/browsermt/students/tree/master/train-student

Note, due to stubbornness in marian-nmt/marian-dev#762 you won't get the best 8-bit performance with output layer quantization. That's in https://github.com/browsermt/marian-dev

Sarah · Answer 4 · Tue Apr 06 2021 17:15:05 GMT+0800 (China Standard Time)

what is the correct way to get a 8-bit model? the doc says "add the following switches to the marian command: --quantize-bits 8" would work.
q1: Using the project called "marian" or "marian-dev" ?
q2: is that true ? add the following switches to the marian command: --quantize-bits 8 ?
q3: if not, could you give the correct orders to train a 8-bit model?

Kenneth Heafield · Answer 5 · Tue Apr 06 2021 17:18:09 GMT+0800 (China Standard Time)

There is documentation at https://github.com/browsermt/students/tree/master/train-student ; if it's unclear feel free to file an issue against that repo.