OpenNMT / CTranslate2

Fast inference engine for Transformer models

Home Page:https://opennmt.net/CTranslate2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inference for ctranslate2 using tensor parallel with mpi

mohith56 opened this issue · comments

import ctranslate2,psutil,os,transformers,time,torch

generator = ctranslate2.Generator("/ct2opt-1.3b",tensor_parallel=True,device="cuda")
tokenizer = transformers.AutoTokenizer.from_pretrained("facebook/opt-1.3b")

def generate_text(text):
for prompt in text:
start_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt))
results = generator.generate_batch([start_tokens], max_length=30,include_prompt_in_result=False)
output = tokenizer.decode(results[0].sequences_ids[0])
return output

text = ["Hello, I am"]
results=generate_text(text)
print(results)

i am getting 4 outputs when i run this script using this command: mpirun -np 4 python3 ffctranslateload.py
and the results are very bad. although the model is distributed.when i keep -np 1 the results are good. how to get good results when i keep -np 4

Which version ct2 you used? Try CT2 with version 4.2.1 or 4.3.1.