AILab-CVC / SEED-Bench

(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[bugs] LLaVA-Evaluation : RuntimeError: Expected all tensors to be on the same device

JJJYmmm opened this issue · comments

commented

Problems

When testing LLaVA-v1.5 with eval.py, the following error occurs.

*** RuntimeError: Expected all tensors to be on the same device, but found at least two devices, 
cuda:0 and cuda:1! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

This is because when using huggingface to load the model, the default parameter device_map="auto", the model will be loaded to multiple GPUs (Pipeline Parallelism).

def load_pretrained_model(model_path, model_base, model_name, \
load_8bit=False, load_4bit=False, device_map="auto", device="cuda", **kwargs):
    ...

While in eval.py, the wrapped model(MLLM_Tester) will be called the cuda method, and the model parameters will be loaded to the default gpu again.

model = build_model(args.model).cuda()

With the AlignDevicesHook conflict, the data is loaded to other gpus in some layer, and now all the parameters are on the default gpu, which triggers the error report.

Solution

I think removing .cuda() here is ok, though I only check the llava interface.

model = build_model(args.model).cuda()