marian-nmt / marian

Fast Neural Machine Translation in C++

Home Page:https://marian-nmt.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

quantize setting as the doc said but lead to "skipping *-th update due to loss being nan" for all train data input

Sarah-Callies opened this issue · comments

commented

Bug description

mairan version: Marian v1.10.0 6f6d484

when I set the configuration of quantize like this:
--sync-sgd --quantize-bits 8 --quantize-optimization-steps 10 --quantize-biases true

the file "marian-master\src\training\graph_group_sync.cpp"
Line 379 will occurs:
skipping *-th update due to loss being nan
because of localLoss.loss is NaN

Are you by any chance, training a quantized model from scratch?

One option is to train a normal model first, then activate the quantization.
alternatively, not using --quantize-biases true should fix the issue, and is it the recommended setting anyway.
my recommendation is both: train a model normally, then activate quantization without the bias.

also, if we only train for 8-bit I think --quantize-optimization-step is not necessary. It was designed for more extreme quantization (4-bit or less). But turning it on should be fine, though will slow down the training speed.

(see the setting: https://github.com/browsermt/students/blob/master/train-student/finetune/run.me.finetune.example.sh)

commented

I adopt your advice:activate quantization by only set “--quantize-bits 16” after training one model normally,the command to start up quanization is like this:
"
nohup ./marian/build/marian -d 3 -w 12000 --model ./output2/model.npz --sync-sgd --quantize-bits 16 --train-sets ./nmt_data/en.filter15651781.bpe ./nmt_data/ja.filter15651781.bpe --vocabs ./marian/build/vocab.yml ./marian/build/vocab.yml > je.log2 &
"

and my question is that : why the model size is still the same?

That makes a FP32 model that's ready to be 8-bit quantized. Next step is to binarize it.
https://github.com/browsermt/students/tree/master/train-student

Note, due to stubbornness in marian-nmt/marian-dev#762 you won't get the best 8-bit performance with output layer quantization. That's in https://github.com/browsermt/marian-dev

commented

what is the correct way to get a 8-bit model? the doc says "add the following switches to the marian command: --quantize-bits 8" would work.
q1: Using the project called "marian" or "marian-dev" ?
q2: is that true ? add the following switches to the marian command: --quantize-bits 8 ?
q3: if not, could you give the correct orders to train a 8-bit model?

There is documentation at https://github.com/browsermt/students/tree/master/train-student ; if it's unclear feel free to file an issue against that repo.