Process hangs when using `tensor_parallel_size` and `data_parallel_size` together
harshakokel opened this issue · comments
Hello,
I noticed that my process hangs at results = ray.get(object_refs)
when I use data_parallel_size
as well as tensor_parallel_size
for vllm models.
For example, this call would hang.
lm_eval --model vllm --model_args pretrained=gpt2,data_parallel_size=2,tensor_parallel_size=2 --tasks arc_easy --output ./trial/ --log_samples --limit 10
These would not.
lm_eval --model vllm --model_args pretrained=gpt2,data_parallel_size=1,tensor_parallel_size=2 --tasks arc_easy --output ./trial/ --log_samples --limit 10
lm_eval --model vllm --model_args pretrained=gpt2,data_parallel_size=2,tensor_parallel_size=1 --tasks arc_easy --output ./trial/ --log_samples --limit 10
Does anyone else face similar problem?
Hi! What version of vLLM are you running with?
@baberabb has observed some problems like this before with later versions ( >v0.3.3 I believe) of vllm.
I am on vllm 0.3.2
.
Is this a vllm problem? Should I be raising an issue on that repo?
Hey. Have you tried caching the weights by running with DP=1 until they are downloaded? I found it prone to hang with DP otherwise.
Yes, the weights are cached. The process is hanging after llm.generate
returns results.
Yes, the weights are cached. The process is hanging after
llm.generate
returns results.
hmm. It's working for me with 0.3.2
. Have you tried running on a fresh virtual environment?
Just tried it on a separate server and new env still face the same issue. What version of ray do you have? Mine is ray==2.10.0
Just tried it on a separate server and new env still face the same issue. What version of ray do you have? Mine is
ray==2.10.0
Probably the latest one. I installed it with pip install -e ".[vllm]"
on runpod with 4 GPUs.