How to use gpu to accelerate inference?

Question

How to use gpu to accelerate inference?

13990008036 opened this issue a year ago · comments

How to use gpu to accelerate inference? I deployed it exactly according to howtouse on GitHub, but it was still very slow and did not use gpu for inference

跃跃欲试 · Answer 1 · Mon Aug 07 2023 01:42:19 GMT+0800 (China Standard Time)

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/codegen25-7b-mono", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen25-7b-mono")
inputs = tokenizer("# this function prints hello world", return_tensors="pt")
sample = model.generate(**inputs, max_length=128)
print(tokenizer.decode(sample[0]))