Giters
Tlntin
/
Qwen-TensorRT-LLM
Geek Repo:
Geek Repo
Github PK Tool:
Github PK Tool
Stargazers:
520
Watchers:
6
Issues:
113
Forks:
48
Tlntin/Qwen-TensorRT-LLM Issues
triton同步异步接口询问
Updated
3 months ago
Comments count
10
how to build Qwen-72B-Chat-Int4 with tp=2
Updated
3 months ago
Comments count
16
运行run.py报错,Segmentation fault (core dumped)
Closed
3 months ago
Comments count
8
ModuleNotFoundError: No module named 'transformers.models.qwen2'
Closed
3 months ago
Comments count
2
运行build文件报错: TypeError: RowLinear.__init__() got an unexpected keyword argument 'instance_id'
Closed
3 months ago
Comments count
2
请问目前的Qwen-VL实现方式,是否仅支持输入单张图片,且图片必须在输入的开头?
Closed
3 months ago
Comments count
1
测试hf吞吐OOM以及triton并发、流式输出问题
Closed
3 months ago
Comments count
21
使用triton + inflight_batching 后吞吐反而降了
Closed
5 months ago
Comments count
2
请问如何支持正常的batch infer ?
Closed
3 months ago
Comments count
2
有人能共享Build好的qwen或qwen1.5 int4的trt_engine(4gpu)文件吗?
Updated
3 months ago
Comments count
11
请问为什么smoothquant量化后显存占用不降低呢
Closed
3 months ago
Comments count
6
大佬有没有对比和VLLM的推理效果?
Updated
3 months ago
使用auto-gptq编译qwen_1_8B-Chat-int4官方报错'KeyError: 'transformer.h.0.attn.c_attn.qweight'
Closed
3 months ago
Comments count
5
想问一下,为什么72B模型是实验性的呢?架构应该是一样的呀,原因是什么呢?谢谢
Closed
3 months ago
Comments count
2
Qwen-72B-Chat-Int4 killed
Closed
3 months ago
Comments count
1
ERROR: Failed to create instance: unexpected error when creating modelInstanceState
Closed
3 months ago
Comments count
3
Qwen1.5 GPTQ用不了
Closed
3 months ago
Comments count
2
Qwen1.5 GPTQ-Int4 编译失败
Closed
3 months ago
Comments count
15
Qwen1.5 GPTQ编译错误
Closed
3 months ago
Comments count
1
Qwen2 编译错误
Closed
3 months ago
Comments count
5
TensorRT_LLM 0.7.0 编译 Qwen-7B-Chat 模型,编译后启动API似乎无法支持并发访问API?
Closed
3 months ago
Comments count
2
Qwen-72B有遇到输入超过2048以后返回有问题的情况吗
Updated
4 months ago
Comments count
25
请问是否有尝试过在mpirun -n 大于1的情况下提供http服务?
Closed
4 months ago
Comments count
6
swift微调的qwen-vl支持吗
Updated
4 months ago
Comments count
1
函数调用会报错
Closed
4 months ago
大佬请问个问题:AttributeError: 'QWenForCausalLM' object has no attribute 'embedding'
Closed
4 months ago
web demo error
Closed
4 months ago
Comments count
1
使用官方的Qwen-xxB-Chat-Int4转TRT,都用greedy sereach,TRT和torch的结果不一致正常吗
Updated
5 months ago
Comments count
9
inflight_batching
Updated
5 months ago
Comments count
23
TensorRT的采样可以和QWen官方generation_config.json里面提供的采样参数对齐吗?
Updated
5 months ago
Comments count
4
Qwen-14B-Chat-Int4运行后预测结果不对
Closed
5 months ago
Comments count
4
Qwen-VL build.py: error: unrecognized arguments: --use_rmsnorm_plugin --use_lookup_plugin float16 --max_prompt_embedding_table_size 2048
Updated
5 months ago
Comments count
1
TypeError: missing a required argument: 'host_sink_token_length'
Closed
5 months ago
Comments count
2
qwen-14b int4-awq 量化失败
Closed
5 months ago
Comments count
7
triron部署成功后,每个卡上多出来几个进程
Closed
5 months ago
Comments count
12
Triton部署TensorRT-LLM报错
Closed
5 months ago
Comments count
9
使用autodl编译tensorrt-llm有问题
Closed
5 months ago
Comments count
6
Use official int4 weights, e.g. Qwen-1_8B-Chat-Int4 model(recommended) - Build TRT-LLM engine
Closed
5 months ago
Comments count
6
summarize.py运行解答
Closed
5 months ago
Comments count
1
推理加速效果怎么样?
Updated
5 months ago
Comments count
1
Qwen-14B INT4-AWQ 用tp=2时量化失败
Updated
6 months ago
Comments count
1
Triton的显存占用是TensorRT—llm的两倍
Closed
6 months ago
Comments count
19
想使用baichuan2部署api的话该修改什么地方适配百川模型呢?
Closed
6 months ago
Comments count
3
Qwen-14B-chat 多batch 报错
Closed
6 months ago
Comments count
3
build完之后跑cli_chat.py报错
Closed
6 months ago
Comments count
15
cnn_dailymail
Closed
6 months ago
Comments count
30
AttributeError: '_Runtime' object has no attribute 'address'
Closed
6 months ago
Comments count
5
qwen-14b-chat-int4转完之后推理乱码
Closed
6 months ago
Comments count
3
多机多卡推理
Updated
6 months ago
Comments count
3
运行build的时候出了点问题
Closed
6 months ago
Comments count
4
Previous
Next