OpenNMT / CTranslate2

Fast inference engine for Transformer models

Home Page:https://opennmt.net/CTranslate2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Gemma] GELU should be approx tanh not exact

NeonBohdan opened this issue · comments

From the initial release of Gemma some fixes were made to make it stable (as it should have been)

https://unsloth.ai/blog/gemma-bugs
Here is the summary

3-7 looks like a play with dtype per layer, which isn't a case for ctranslate2, as there is no dtype per layer settings

8 may be very useful and was loud case about gemma
huggingface/transformers#29729

Is it possible to use approximate gelu for Gemma in ctranslate2?

I see GeluTanh already implemented in ctranslate2

case ActivationType::GELUTanh: {

So fix should be just a config update here with gelu_pytorch_tanh

activation=common_spec.Activation.GELU,

Hello, Thank you for your information. I will update the converter asap. For the point 7, we create RoPE following the dtype of input (not always float32) but I think it should be ok if my understanding is correct.