[Gemma] GELU should be approx tanh not exact
NeonBohdan opened this issue · comments
From the initial release of Gemma some fixes were made to make it stable (as it should have been)
https://unsloth.ai/blog/gemma-bugs
Here is the summary
3-7 looks like a play with dtype per layer, which isn't a case for ctranslate2, as there is no dtype per layer settings
8 may be very useful and was loud case about gemma
huggingface/transformers#29729
Is it possible to use approximate gelu for Gemma in ctranslate2?
I see GeluTanh already implemented in ctranslate2
CTranslate2/src/ops/activation.cc
Line 21 in 8994330
So fix should be just a config update here with gelu_pytorch_tanh
Hello, Thank you for your information. I will update the converter asap. For the point 7, we create RoPE following the dtype of input (not always float32) but I think it should be ok if my understanding is correct.