EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Home Page:https://lmms-lab.github.io/lmms-eval-blog/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Model will be loaded on different devices when using multiple gpus.

baichuanzhou opened this issue · comments

It appears that models will be loaded on different gpus when setting num_processes to more than one, which will cause error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

Here's my command to launch:

accelerate launch --num_processes=2 -m lmms_eval --model llava   --model_args pretrained="xxx,conv_template=xxx"   --tasks gqa,vqav2,scienceqa,textvqa --batch_size 1 --log_samples --log_samples_suffix xxx --output_path ./logs/

I found a temporary fix by installing previous version:
pip install git+https://github.com/EvolvingLMMs-Lab/lmms-eval.git@bf4c78b7e405e2ca29bf76f579371382fec3dd02
and in this version multi-gpu inference works fine.

May I ask in which line of inference did this error occur?

Sorry for the delay.

Here is one error message:

[lmms_eval/models/llava.py:386] ERROR Error Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution) in generating.

You might also want to try setting device_map=auto in your model_args when you do multi-processing

--model_args pretrained=xxx,conv_template=xxx,device_map=auto

Setting device_map to auto didn't do the trick. Here's my command:

srun -p xxx --gres=gpu:4 accelerate launch --num_processes=4 --main_process_port 19500 -m lmms_eval --model llava   --model_args pretrained="xxx,conv_template=xxx,device_map=auto"   --task textvqa_val,vizwiz_vqa_val,mmbench_en --batch_size 1 --log_samples --log_samples_suffix llava_hermes2_llama3_merged_data_v1.1_anyres_tune_vit --output_path ./logs/ #

I noticed one difference between evaluation using v0.1.2 and bf4c78b7e405e2ca29bf76f579371382fec3dd02 was this logger information:
v0.1.2:[lmms_eval/models/llava.py:124] INFO Using single device: cuda
bf4c78b7e405e2ca29bf76f579371382fec3dd02: lmms_eval/models/llava.py:104] INFO Using 4 devices with data parallelism

Line 104 appears to be here.

Sorry, my bad

Should set device_map="" when using multiprocess. Set device_map=auto only when you use num_processes=1

Thanks. Now it works!