QwenLM / qwen.cpp

C++ implementation of Qwen-LM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python Binding之后,如何只使用cpu进行推理呢?

zzzcccxx opened this issue · comments

我使用了如下代码

from qwen_cpp import Pipeline                 
pipeline = Pipeline("../qwen.cpp/qwen1-8b-ggml.bin", "../qwen_1_8b/qwen.tiktoken")

result2 = pipeline.chat(["Hello"],stream=True)
for item in result2:
  print(item)

但输出是在所有gpu上一起跑,请问如何只在cpu上跑呢?