Question about changing precision post training

Question

Question about changing precision post training

Thytu opened this issue 5 months ago · comments

Valentin De Matos commented 5 months ago

In the Changing precision post-training section it is stated that :

Using fp16-pretrained model in bf16 regime usually fails - due to overflows [...]

Using bf16-pretrained model in fp16 regime usually works - it will lose some performance on conversion [...]

When reading this statement I consider the following scenario:

model_in_fp16.to(bf16) # Overflow
model_in_bf16.to(fp16) # OK

I'm quite surprised and would have expected the opposite statement as converting weights from fp16 $[-65504;66504]$ to bf16 $[-2^{126}; 2^{127}]$ wouldn't results in a overflow where converting weights from bf16 $[-2^{126}; 2^{127}]$ to fp16 $[-65504;66504]$ could result in a under/overflow.

Is there something I'm overlooking or misunderstanding?
Is the term "in bf16 regime" actually implying that it receives bf16 inputs?

Stas Bekman · Answer 1 · Tue Apr 09 2024 03:53:13 GMT+0800 (China Standard Time)

Thank you for catching my mistake, Valentin - super appreciating noticing the reversal!

Fixed here: #44

Stas Bekman · Answer 2 · Tue Apr 09 2024 03:56:26 GMT+0800 (China Standard Time)

oh, bf16 regime just means that you do the math in bf16 - either through AMP or no AMP where model weights are in bf16.

At least in the LM the inputs are IDs, not floats. but further forwards' inputs are floats.

If you feel that more commentary is needed in that section please kindly suggest what to add.