hf_hub_ctranslate2

Connecting Transformers on HuggingfaceHub with Ctranslate2 - a small utility for keeping tokenizer and model around Huggingface Hub.

Read the docs

Usage:

PYPI Install

pip install hf-hub-ctranslate2

Decoder-only Transformer:

# download ctranslate.Generator repos from Huggingface Hub (GPT-J, ..)
from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub

model_name_1="michaelfeil/ct2fast-pythia-160m"
model = GeneratorCT2fromHfHub(
    # load in int8 on CPU
    model_name_or_path=model_name_1, device="cpu", compute_type="int8"
)
outputs = model.generate(
    text=["How do you call a fast Flan-ingo?", "User: How are you doing?"]
    # add arguments specifically to ctranslate2.Generator here
)

Encoder-Decoder:

from hf_hub_ctranslate2 import TranslatorCT2fromHfHub
# download ctranslate.Translator repos from Huggingface Hub (T5, ..)
model_name_2 = "michaelfeil/ct2fast-flan-alpaca-base"
model = TranslatorCT2fromHfHub(
        # load in int8 on CUDA
        model_name_or_path=model_name_2, device="cuda", compute_type="int8_float16"
)
outputs = model.generate(
    text=["How do you call a fast Flan-ingo?", "Translate to german: How are you doing?"],
    # use arguments specifically to ctranslate2.Translator below:
    min_decoding_length=8,
    max_decoding_length=16,
    max_input_length=512,
    beam_size=3
)
print(outputs)

Encoder-Decoder for multilingual translations (m2m-100):

from hf_hub_ctranslate2 import MultiLingualTranslatorCT2fromHfHub
model = MultiLingualTranslatorCT2fromHfHub(
    model_name_or_path="michaelfeil/ct2fast-m2m100_418M", device="cpu", compute_type="int8",
    tokenizer=AutoTokenizer.from_pretrained(f"facebook/m2m100_418M")
)

outputs = model.generate(
    ["How do you call a fast Flamingo?", "Wie geht es dir?"],
    src_lang=["en", "de"],
    tgt_lang=["de", "fr"]
)

Encoder-only Sentence Transformers

Feel free to try out a new repo, using CTranslate2 for vector-embeddings: https://github.com/michaelfeil/infinity

from hf_hub_ctranslate2 import CT2SentenceTransformer
model_name_pytorch = "intfloat/e5-small"
model = CT2SentenceTransformer(
    model_name_pytorch, compute_type="int8", device="cuda", 
)
embeddings = model.encode(
    ["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
    batch_size=32,
    convert_to_numpy=True,
    normalize_embeddings=True,
)
print(embeddings.shape, embeddings)
scores = (embeddings @ embeddings.T) * 100

Encoder-only -> no longer recommended

from hf_hub_ctranslate2 import EncoderCT2fromHfHub
model_name = "michaelfeil/ct2fast-e5-small"
model = EncoderCT2fromHfHub(
        # load in int8 on CUDA
        model_name_or_path=model_name,
        device="cuda",
        compute_type="int8_float16",
)
outputs = model.generate(
    text=["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
    max_length=64,
)

michaelfeil / hf-hub-ctranslate2