Problem：多模态的部分是如何处理的？

Question

t90tank opened this issue 4 months ago · comments

请教一下，qwen-vl用GPU处理图片的时候会block住continuous batching吗？

ySingularity · Answer 1 · Tue Mar 19 2024 14:09:56 GMT+0800 (China Standard Time)

同一batch中如果同时有多条query，正在处理图片的query确实会拉低其他query的产出token速度；这个流程目前是在凑batch之后，word embedding的时候做的