alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

alibaba/rtp-llm Issues

编译报错 v0.2.0 版error: cannot convert ‘<brace-enclosed initializer list>’ to ‘const fastertransformer::AllGatherParams&’
Updated a month ago5
单机多卡如何制定卡号
Closed a month ago1
Glm4v运行问题
Closed a month ago3
glm4v 单卡Cuda out of memory
Closed a month ago1
Qwen-vl-chat的结果和transformer的结果不一样，有点奇怪地像续写出来的
Closed 2 months ago3
v0.2.0(cuda12)对比 v0.1.13(cuda11)表现下降
Updated a month ago1
请问 0.2.0 支持cuda 11环境么？
Closed a month ago
[Feature Request] Add support for CogVLM2
Closed a month ago5
ChatGLM4-9B运行不起来
Closed a month ago1
qwen2 gptq tp=4 报错：AssertionError: error config
Updated a month ago
Does it support Qwen2、ChatGLM4-9B?
Closed a month ago4
请问multimodel_mixin.py中BaseMultiModalWeightInfo的_get_vit_info函数中self.vit_weights？
Closed a month ago8
v0.1.13 load qwen2 gptq失败
Closed a month ago2
请问支持流式吗？
Closed a month ago1
openai.InternalServerError: Error code: 500 - {'error_code': 514, 'message': 'ErrorMsg: failed to malloc 134 blocks, only 28 blocks left
Closed a month ago1
多卡部署空闲但导致的其他模型速度降低很多
Closed 2 months ago2
Qwen Chat CUDA OutOfMemory
Updated 2 months ago2
build error：ERROR: An error occurred during the fetch of repository 'pip_gpu_cuda12_torch
Closed 4 months ago2
[Feature Request] llama3
Closed 2 months ago1
rtp-llm example test issue
Closed 2 months ago1
Buffer overflow at CudaAttentionOpTest::selfAttentionOpTest
Updated 2 months ago1
Remove print statements
Updated 2 months ago1
Poor performance at batchsize=1 on V100
Closed 3 months ago12
Build error in cuda:tensor_utils: ‘getTypeSize’ is not a member of ‘fastertransformer::Tensor
Closed 3 months ago1
qwen1.5-14b-chat部署awq
Closed 3 months ago3
failed to run : RuntimeError: torch.cat()
Closed 3 months ago1
多卡推理
Closed 3 months ago8
Feature request: encoder-decoder model support
Closed 3 months ago1
ValueError: max() arg is an empty sequence
Closed 3 months ago4
请问src/fastertransformer/models/multi_gpu_gpt/ParallelAttentionWrapper.cc的ContextAttention是什么概念呢？和SelfAttention有什么区别呢
Closed 3 months ago2
#RTP-LLM Developer Event# 春季限定活动，捉bug送美味咖啡☕️
Updated 4 months ago2
awq
Closed 4 months ago2
请问kmonitor metrics怎么开启打印呢？想测试一下每个阶段的耗时
Closed 4 months ago1
bazel构建成功，但是测试报错
Closed 4 months ago4
bazel cu11x 编译失败
Closed 4 months ago1
0.1.8 release cuda12.1 whl包不完整
Closed 4 months ago3
follow readme then error
Closed 4 months ago2
bazel编译失败
Closed 4 months ago1
Problem：多模态的部分是如何处理的？
Closed 4 months ago1
BUG: MISSING QUOTATION MARKS AND LINE BREAKS
Closed 4 months ago2
怎么使用qwen medusa推理加速
Closed 4 months ago1
Error in DeployDocker.md
Closed 4 months ago1
2 GPUs with TP=2 run Lora inference, one GPU
Closed 4 months ago1
bazel构建失败
Closed 4 months ago10
最新whl包无法启动server
Closed 4 months ago5
按照官方示例 https://github.com/alibaba/rtp-llm/blob/main/docs/Multimodal-Tutorial.md 报错 maga_transformer.config.exceptions.FtRuntimeException: raw request format cannot accept dict prompt
Closed 4 months ago1
Is there a plan to support Eagle?
Closed 4 months ago1
random_seed未生效
Closed 4 months ago1
[bug ?] mega_transformer/models/llava.py中encode_images方法
Closed 4 months ago1
KeyError: 'MODEL_TYPE'
Closed 5 months ago1