Please support gemma arch
NeonBohdan opened this issue · comments
It's a unique model
With 256K tokenizer like MT5 but decoder only
So hoping for good multi language capabilities comapared to llama tokenizer version
Hoping it will be easy enough(like llama -> mistal)
As I see this project is now harder to maintain
But better than llama.cpp or vllm in my opinion
https://huggingface.co/google/gemma-7b-it
Maybe this will help:
ggerganov/llama.cpp#5631
vllm-project/vllm#2960
Hi @NeonBohdan Have you tried Gemma with CTranslate2? Does it generate the same output as Transformers? It seems it starts with a good generation, and then continues with repeated words. However, I might be missing something.
@ymoslem, I haven't compiled ctranslate2 to test it yet, and I'm waiting for the release.
However, there seems to be an issue with Gemma-it, compared to Mistral.
The situation strengthens with quantization.
You can try using a repetition penalty
, but overall, I've observed this problem as well.
Thanks, @NeonBohdan for your response! I tried repetition_penalty
but it does not seem to help.
I suspect it can be an issue with quantization. I will try without it and see.