Please support gemma arch

Question

Please support gemma arch

NeonBohdan opened this issue 5 months ago · comments

It's a unique model
With 256K tokenizer like MT5 but decoder only
So hoping for good multi language capabilities comapared to llama tokenizer version

Hoping it will be easy enough(like llama -> mistal)
As I see this project is now harder to maintain
But better than llama.cpp or vllm in my opinion

https://huggingface.co/google/gemma-7b-it

Maybe this will help:
ggerganov/llama.cpp#5631
vllm-project/vllm#2960

Minh-Thuc · Answer 1 · Wed Feb 28 2024 01:01:59 GMT+0800 (China Standard Time)

Support Gemma soon with #1631

Yasmin Moslem · Answer 2 · Fri Mar 08 2024 04:10:02 GMT+0800 (China Standard Time)

Hi @NeonBohdan Have you tried Gemma with CTranslate2? Does it generate the same output as Transformers? It seems it starts with a good generation, and then continues with repeated words. However, I might be missing something.

NeonBohdan · Answer 3 · Fri Mar 08 2024 04:24:24 GMT+0800 (China Standard Time)

@ymoslem, I haven't compiled ctranslate2 to test it yet, and I'm waiting for the release.
However, there seems to be an issue with Gemma-it, compared to Mistral.
The situation strengthens with quantization.

You can try using a repetition penalty, but overall, I've observed this problem as well.

Yasmin Moslem · Answer 4 · Fri Mar 08 2024 04:49:59 GMT+0800 (China Standard Time)

Thanks, @NeonBohdan for your response! I tried repetition_penalty but it does not seem to help.
I suspect it can be an issue with quantization. I will try without it and see.