Error in ChatGLM inference

Question

Error in ChatGLM inference

sudusuperman opened this issue 9 months ago · comments

I'm testing ChatGLM
After following the instructions in README.md

python finetune.py --model_type chatglm --data "data/train/" --model_path "LLMs/chatglm/chatglm-6b/" --adapter "lora" --output_dir "output/chatglm"

And then

python inference.py --model_type chatglm --instruction "Who are you?" --model_path "LLMs/chatglm/chatglm-6b/" --adapter_weights "output/chatglm" --max_new_tokens 256

I get:
...
Loading checkpoint shards: 100%|██████████████████| 8/8 [00:04<00:00, 1.73it/s]
Find 1 cases
The dtype of attention mask (torch.int64) is not bool

LLM says:
Eval Error

Reproduce:

1 Training data that I'm using:
{"id": "seed_task_0", "name": "breakfast_suggestion", "instruction": "who are you??", "input": "Who are you?", "output": "Yes, you can have 1 oatmeal banana protein shake and 4 strips of bacon. The oatmeal banana protein shake may contain 1/2 cup oatmeal, 60 grams whey protein powder, 1/2 medium banana, 1tbsp flaxseed oil and 1/2 cup watter, totalling about 550 calories. The 4 strips of bacon contains about 200 calories.", "is_classification": false}
{"id": "seed_task_1", "name": "breakfast_suggestion", "instruction": "who are you??", "input": "Who are you?", "output": "Yes, you can have 1 oatmeal banana protein shake and 4 strips of bacon. The oatmeal banana protein shake may contain 1/2 cup oatmeal, 60 grams whey protein powder, 1/2 medium banana, 1tbsp flaxseed oil and 1/2 cup watter, totalling about 550 calories. The 4 strips of bacon contains about 200 calories.", "is_classification": false}
{"id": "seed_task_2", "name": "breakfast_suggestion", "instruction": "who are you??", "input": "Who are you?", "output": "Yes, you can have 1 oatmeal banana protein shake and 4 strips of bacon. The oatmeal banana protein shake may contain 1/2 cup oatmeal, 60 grams whey protein powder, 1/2 medium banana, 1tbsp flaxseed oil and 1/2 cup watter, totalling about 550 calories. The 4 strips of bacon contains about 200 calories.", "is_classification": false}
{"id": "seed_task_3", "name": "breakfast_suggestion", "instruction": "who are you??", "input": "Who are you?", "output": "Yes, you can have 1 oatmeal banana protein shake and 4 strips of bacon. The oatmeal banana protein shake may contain 1/2 cup oatmeal, 60 grams whey protein powder, 1/2 medium banana, 1tbsp flaxseed oil and 1/2 cup watter, totalling about 550 calories. The 4 strips of bacon contains about 200 calories.", "is_classification": false}

2 Some dependency version different from requirement.txt:
torch==2.1.0.dev20230830
torchaudio==2.1.0.dev20230830
torchvision==0.16.0.dev20230830
icetk==0.0.4

cckuailong · Answer 1 · Fri Sep 01 2023 09:27:47 GMT+0800 (China Standard Time)

"The dtype of attention mask (torch.int64) is not bool" is a warning that can be ignored.
Maybe the error is reported because some other reasons like "out of memory".
You can add "--debug" to show the error. If the error is "out of memory", please set max_token smaller, like 64 or 32

python inference.py --model_type chatglm --instruction "Who are you?" --model_path "LLMs/chatglm/chatglm-6b/" --adapter_weights "output/chatglm" --max_new_tokens 64 --debug

toufunao · Answer 2 · Fri Sep 01 2023 10:01:26 GMT+0800 (China Standard Time)

Maybe you should check how official used model.generate(), I met the same problem in Qwen. I replaced the call method with official guidance and it generated the things I needed.

cailinhang · Answer 3 · Mon Oct 09 2023 12:05:18 GMT+0800 (China Standard Time)

I met the same inference error after I finetune the Qwen-7b. The inference error message is

LLM says:
Eval Error

when I add --debug the the command,

 python3 inference.py --model_type qwen --instruction "Who are you?" --input "" --model_path $model_path  --adapter_weights $output_dir  --max_new_tokens 10 --debug

the error message states that
expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

The gpu I use is 8 v100 gpu.

The error is in the the code

model.generate

I think it is about the device of the model mismatch with the device of input_ids. But I failed to fix this bug.
I have to add export CUDA_VISIBLE_DEVICES=0 to only use 1 gpu to avoid this error. But I wonder how to use multi-gpu to infer.