kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Quantization for training / finetuning

torphix opened this issue · comments

Hi!
Thanks for the lib and tutorial, it is very informative.

With respect to finetuning would it be worth quantizing the model first to fp16 or even int8 before beginning training?
As this might lead to better accuracy when compared to quantizing after the model has been finetuned?

Thanks