OpenNMT / CTranslate2

Fast inference engine for Transformer models

Home Page:https://opennmt.net/CTranslate2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Please support gemma arch

NeonBohdan opened this issue · comments

It's a unique model
With 256K tokenizer like MT5 but decoder only
So hoping for good multi language capabilities comapared to llama tokenizer version

Hoping it will be easy enough(like llama -> mistal)
As I see this project is now harder to maintain
But better than llama.cpp or vllm in my opinion

https://huggingface.co/google/gemma-7b-it

Maybe this will help:
ggerganov/llama.cpp#5631
vllm-project/vllm#2960

Support Gemma soon with #1631

Hi @NeonBohdan Have you tried Gemma with CTranslate2? Does it generate the same output as Transformers? It seems it starts with a good generation, and then continues with repeated words. However, I might be missing something.

@ymoslem, I haven't compiled ctranslate2 to test it yet, and I'm waiting for the release.
However, there seems to be an issue with Gemma-it, compared to Mistral.
The situation strengthens with quantization.

You can try using a repetition penalty, but overall, I've observed this problem as well.

Thanks, @NeonBohdan for your response! I tried repetition_penalty but it does not seem to help.
I suspect it can be an issue with quantization. I will try without it and see.