ztxz16 / fastllm

纯c++的全平台llm加速库，支持python调用，chatglm-6B级模型单卡可达10000+token / s，支持glm, llama, moss基座，手机端流畅运行

ztxz16/fastllm Issues

chatglm 失去 function calling 能力
Updated 9 days ago
请问一下国产显卡Ascend 910 and Hygon DCU如何安装fastllm？
Updated 10 days ago1
编译完之后运行模型时报错
Updated 16 days ago1
GLM4-V-9B什么时候会出部署代码呢？
Updated a month ago
如何多卡部署
Updated a month ago1
OSError: libcublas.so.ll: cannot open shared odject file: No such file or directory
Updated 2 months ago1
Meta-Llama-3-70B-Instruct
Updated 2 months ago5
make -j过程中报错
Updated 2 months ago3
请问什么时候支持GLM-4 ？
Closed 2 months ago4
GLM-4-6B-Chat转换成flm格式后不能加载
Closed 2 months ago5
请问现在支持deepseekv2量化吗
Closed 3 months ago1
H800 docker 编译, half类型转换编译报错
Closed 3 months ago1
qwen1.5 int4模型回复出现解码问题：UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 72-73: invalid continuation byte
Updated 3 months ago
[CMakeFiles/Makefile2:100: CMakeFiles/pyfastllm.dir/all]
Updated 4 months ago
结果返回一直是<unk>
Updated 4 months ago1
chatglm3 相同提示词生成结果一致
Updated 4 months ago
Do you have a plan to implement the CudaCatOp?
Updated 4 months ago
中文输入无法识别；webui打开的地址无法访问。
Closed 4 months ago1
千问qwen1.5-14B-chat解码错误
Updated 4 months ago2
cmake -j报错
Updated 4 months ago2
无法安装fastllm_pytools
Updated 4 months ago1
流式输出中断问题
Updated 5 months ago
模型转换的时候是不是不能用量化过的模型
Updated 5 months ago1
是否支持qwen1.5的滑动窗口的方式
Updated 5 months ago
大佬您好，这个性能和chatglm.cpp比起来，会更好吗
Updated 5 months ago
Error: cublas error during MatMul in Attention operator.
Closed 5 months ago3
fastllm是否支持使用bitsandbytes量化的chatglm3-6b-base int4模型
Updated 5 months ago
/api/chat_stream The result returned by postman is empty
Updated 5 months ago
chatGLM6b保存CUDA error when release memory!
Closed 7 months ago1
ResponseBatch 返回结果不正确
Updated 6 months ago5
BAICHUAN2没有MakeInput的实现
Closed 6 months ago7
请求支持Grouped Query Attention
Closed 6 months ago
batch padding mask 处理的相关代码
Closed 6 months ago
qwen输出结果错误
Closed 6 months ago1
如何贡献代码？
Closed 6 months ago
请教下作者tokenzier encode和decode那部分有对应的python代码或者链接吗
Closed 6 months ago1
后续能否支持ChatGLM3的多轮
Updated 6 months ago2
目前PEFT仅支持chatglm，什么时候可以支持其他模型，比如baichuan2呢？或者需要改哪些地方，很乐意contribute。
Updated 6 months ago1
转化模型格式(.bin->.flm)时
Updated 7 months ago2
大佬想问下利用率只跑到60% 是什么情况?
Updated 7 months ago2
当输出数据特别长的时候报错。
Closed 7 months ago2
报告一个chatGLM3 function_call的bugs
Closed 7 months ago1
在macos intel平台上使用报错
Closed 7 months ago3
使用fastllm推理得到的结果和transforers推理得到的结果不一样。
Closed 7 months ago1
make_input和model.weight.tokenizer.encode会产生多余空格问题
Updated 7 months ago3
fetch_response获取首个token耗时不稳定问题
Updated 7 months ago
运行几个c++示例程序都直接报段错误
Updated 7 months ago1
chatglm3-6b-32k使用fastllm加速后无法推理
Updated 7 months ago2
建议对python调用的model增加model.device 接口
Updated 7 months ago1
flm的tokenizer和原始tokenizer分词结果不一致
Updated 7 months ago1