vllm on tensor parallel - RuntimeError: oneCCL: ze_fd_manager.cpp:144 init_device_fds: EXCEPTION: opendir failed: could not open device directory
flekol opened this issue · comments
Describe the bug
while using vllm in docker and using tensor parallel i get:
RuntimeError: oneCCL: ze_fd_manager.cpp:144 init_device_fds: EXCEPTION: opendir failed: could not open device directory
single card serving works
Llama.cpp works also for multi gpus
i remember some months ago i tried it on intel system with 2 GPUs and it worked.
How to reproduce
Steps to reproduce the error:
- multi gpus
- try to run
Screenshots
If applicable, add screenshots to help explain the problem
Environment information
I'm building myown docker image based on this:
FROM intelanalytics/ipex-llm-serving-xpu:latest
WORKDIR /temp
SHELL ["/bin/bash", "-c"]
RUN apt update && apt install -y libpng16-16
RUN wget http://mirrors.kernel.org/ubuntu/pool/main/libj/libjpeg-turbo/libjpeg-turbo8_2.1.2-0ubuntu1_amd64.deb
RUN apt install ./libjpeg-turbo8_2.1.2-0ubuntu1_amd64.deb
WORKDIR /llm
RUN . /opt/intel/1ccl-wks/setvars.sh
ENTRYPOINT python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server \
--served-model-name ${served_model_name} \
--quantization $quantization \
--model $model \
--port $port \
--trust-remote-code \
--block-size 8 \
--gpu-memory-utilization ${gpu_memory_utilization} \
--device xpu \
--dtype $dtype \
--enforce-eager \
--load-in-low-bit ${load_in_low_bit} \
--max-model-len ${max_model_len} \
--max-num-batched-tokens ${max_num_batched_tokens} \
--max-num-seqs ${max_num_seqs} \
--tensor-parallel-size ${tensor_parallel_size} \
--disable-async-output-proc \
--distributed-executor-backend ray
The docker-compose file looks like below:
services:
vllm-ipex:
image: intelanalytics/ipex-llm-serving-xpu:latest
container_name: vllm-ipex
build:
dockerfile: ./dockerfile/dockerfile
volumes:
- "/models/huggingface:/root/.cache/huggingface"
- "/models/vllm:/llm/models"
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
# restart: unless-stopped
devices:
- /dev/dri:/dev/dri
tty: true
ports:
- 8000:8000
shm_size: "64g"
environment:
- model=Qwen/Qwen2.5-32B-Instruct-AWQ
- served_model_name=Qwen2.5-32B-Instruct-AWQ
- quantization=awq
- TZ=Europe/Berlin
- SYCL_CACHE_PERSISTENT=1
- CCL_WORKER_COUNT=2
- FI_PROVIDER=shm
- CCL_ATL_TRANSPORT=ofi
- CCL_ZE_IPC_EXCHANGE=sockets
- CCL_ATL_SHM=1
- USE_XETLA=OFF
- SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=2
- TORCH_LLM_ALLREDUCE=0
- CCL_SAME_STREAM=1
- CCL_BLOCKING_WAIT=0
- port=8000
- gpu_memory_utilization=0.95
- dtype=float16
- load_in_low_bit=asym_int4
- max_model_len=2048
- max_num_batched_tokens=4000
- max_num_seqs=256
- tensor_parallel_size=2
- pipeline_parallel_size=1
- VLLM_LOGGING_LEVEL=DEBUG
- VLLM_TRACE_FUNCTION=1
System: Ubuntu 24.04
CPU: EPYC 7282
MB: Supermicro h12ssl
RAM: 256GB
GPUs: 4xARC 770 LE
logs
Hi, can you try to add privileged: true into the docker compose file and see if this error persists?
Thanks a lot.
It works, now i feel stupid that i did not come up with this on my own :)