chatglm3-6b-32k使用fastllm加速后无法推理

Question

chatglm3-6b-32k使用fastllm加速后无法推理

JinXuan0604 opened this issue 7 months ago · comments

没有经过微调的chatglm3-6b-32k模型，只使用fastllm加速

import time, torch, os
from transformers import AutoModel, AutoTokenizer
from fastllm_pytools import llm

model_path = "/root/.cache/modelscope/hub/ZhipuAI/chatglm3-6b-32k"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).half().cuda()

query = "数据工厂在哪注册数据啊？"
prompt="""你是一个query理解分析专家。你的任务是仅从我给的query中抽取出其中的关键词，关键词能表征query的含义，去掉非关键词不影响语义理解。
参考示例：
query：'辛苦看下这个Spark任务报错无读取权限，但空间有对应表读取权限'
关键词：#Spark#报错#读取#权限#表#空间#；
query：'mysql实时集成到iceberg作业，启动后报空指针，这是什么原因呢'
关键词：#mysql#iceberg#集成#空指针#实时#。
query: {} """.format(query)

out, _ = model.chat(tokenizer, prompt, do_sample=False, temperature=0.9, max_length=200)
####### 32k输出: 关键词：#数据工厂#注册#数据#

new_model = llm.from_hf(model, tokenizer, dtype="float16")
del model
torch.cuda.empty_cache()
new_model.save("/root/code/zhoujinxuan/chatglm3-6b-32k.flm")

TylunasLi · Answer 1 · Sat Jan 13 2024 11:13:09 GMT+0800 (China Standard Time)

已经提交PR #400 ，等待合并。

TylunasLi · Answer 2 · Mon Jan 15 2024 23:46:56 GMT+0800 (China Standard Time)

代码已合并，可以测试下