cckuailong / SuperAdapters

Finetune ALL LLMs with ALL Adapeters on ALL Platforms!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error in ChatGLM inference

sudusuperman opened this issue · comments

I'm testing ChatGLM
After following the instructions in README.md

python finetune.py --model_type chatglm --data "data/train/" --model_path "LLMs/chatglm/chatglm-6b/" --adapter "lora" --output_dir "output/chatglm"

And then

python inference.py --model_type chatglm --instruction "Who are you?" --model_path "LLMs/chatglm/chatglm-6b/" --adapter_weights "output/chatglm" --max_new_tokens 256

I get:
...
Loading checkpoint shards: 100%|██████████████████| 8/8 [00:04<00:00, 1.73it/s]
Find 1 cases
The dtype of attention mask (torch.int64) is not bool

LLM says:
Eval Error


Reproduce:

1 Training data that I'm using:
{"id": "seed_task_0", "name": "breakfast_suggestion", "instruction": "who are you??", "input": "Who are you?", "output": "Yes, you can have 1 oatmeal banana protein shake and 4 strips of bacon. The oatmeal banana protein shake may contain 1/2 cup oatmeal, 60 grams whey protein powder, 1/2 medium banana, 1tbsp flaxseed oil and 1/2 cup watter, totalling about 550 calories. The 4 strips of bacon contains about 200 calories.", "is_classification": false}
{"id": "seed_task_1", "name": "breakfast_suggestion", "instruction": "who are you??", "input": "Who are you?", "output": "Yes, you can have 1 oatmeal banana protein shake and 4 strips of bacon. The oatmeal banana protein shake may contain 1/2 cup oatmeal, 60 grams whey protein powder, 1/2 medium banana, 1tbsp flaxseed oil and 1/2 cup watter, totalling about 550 calories. The 4 strips of bacon contains about 200 calories.", "is_classification": false}
{"id": "seed_task_2", "name": "breakfast_suggestion", "instruction": "who are you??", "input": "Who are you?", "output": "Yes, you can have 1 oatmeal banana protein shake and 4 strips of bacon. The oatmeal banana protein shake may contain 1/2 cup oatmeal, 60 grams whey protein powder, 1/2 medium banana, 1tbsp flaxseed oil and 1/2 cup watter, totalling about 550 calories. The 4 strips of bacon contains about 200 calories.", "is_classification": false}
{"id": "seed_task_3", "name": "breakfast_suggestion", "instruction": "who are you??", "input": "Who are you?", "output": "Yes, you can have 1 oatmeal banana protein shake and 4 strips of bacon. The oatmeal banana protein shake may contain 1/2 cup oatmeal, 60 grams whey protein powder, 1/2 medium banana, 1tbsp flaxseed oil and 1/2 cup watter, totalling about 550 calories. The 4 strips of bacon contains about 200 calories.", "is_classification": false}

2 Some dependency version different from requirement.txt:
torch==2.1.0.dev20230830
torchaudio==2.1.0.dev20230830
torchvision==0.16.0.dev20230830
icetk==0.0.4

"The dtype of attention mask (torch.int64) is not bool" is a warning that can be ignored.
Maybe the error is reported because some other reasons like "out of memory".
You can add "--debug" to show the error. If the error is "out of memory", please set max_token smaller, like 64 or 32

python inference.py --model_type chatglm --instruction "Who are you?" --model_path "LLMs/chatglm/chatglm-6b/" --adapter_weights "output/chatglm" --max_new_tokens 64 --debug

Maybe you should check how official used model.generate(), I met the same problem in Qwen. I replaced the call method with official guidance and it generated the things I needed.

I met the same inference error after I finetune the Qwen-7b. The inference error message is

LLM says:
Eval Error

when I add --debug the the command,

 python3 inference.py --model_type qwen --instruction "Who are you?" --input "" --model_path $model_path  --adapter_weights $output_dir  --max_new_tokens 10 --debug

the error message states that
expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

The gpu I use is 8 v100 gpu.

The error is in the the code

model.generate

I think it is about the device of the model mismatch with the device of input_ids. But I failed to fix this bug.
I have to add export CUDA_VISIBLE_DEVICES=0 to only use 1 gpu to avoid this error. But I wonder how to use multi-gpu to infer.