OpenNMT / CTranslate2

From the initial release of Gemma some fixes were made to make it stable (as it should have been)

https://unsloth.ai/blog/gemma-bugs
Here is the summary

3-7 looks like a play with dtype per layer, which isn't a case for ctranslate2, as there is no dtype per layer settings

8 may be very useful and was loud case about gemma
huggingface/transformers#29729

Is it possible to use approximate gelu for Gemma in ctranslate2?

I see GeluTanh already implemented in ctranslate2

CTranslate2/src/ops/activation.cc

Line 21 in 8994330

case ActivationType::GELUTanh: {

So fix should be just a config update here with gelu_pytorch_tanh

CTranslate2/python/ctranslate2/converters/transformers.py

Line 1297 in 8994330

activation=common_spec.Activation.GELU,

Hello, Thank you for your information. I will update the converter asap. For the point 7, we create RoPE following the dtype of input (not always float32) but I think it should be ok if my understanding is correct.

[Gemma] GELU should be approx tanh not exact