rmihaylov / falcontune

Tune any FALCON in 4-bit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

generate get error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

631068264 opened this issue · comments

commented

Run

falcontune generate \
    --interactive \
    --model falcon-7b \
    --weights tiiuae/falcon-7b \
    --lora_apply_dir falcon-7b-alpaca \
    --max_new_tokens 500 \
    --use_cache \
    --do_sample \
    --instruction "列出人工智能的五个可能应用"

Error

Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████| 2/2 [00:42<00:00, 21.04s/it]
Device map for lora: auto
falcon-7b-alpaca loaded
/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:318: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Traceback (most recent call last):
  File "/data/home/yaokj5/anaconda3/envs/falcon/bin/falcontune", line 33, in <module>
    sys.exit(load_entry_point('falcontune==0.1.0', 'console_scripts', 'falcontune')())
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/run.py", line 88, in main
    args.func(args)
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/generate.py", line 71, in generate
    generated_ids = model.generate(
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/generate.py", line 27, in autocast_generate
    return self.model.non_autocast_generate(*args, **kwargs)
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/peft/peft_model.py", line 731, in generate
    outputs = self.base_model.generate(**kwargs)
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/transformers/generation/utils.py", line 1565, in generate
    return self.sample(
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/transformers/generation/utils.py", line 2612, in sample
    outputs = self(
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/model/falcon/model.py", line 1072, in forward
    transformer_outputs = self.transformer(
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/model/falcon/model.py", line 967, in forward
    outputs = block(
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/home/yaokj5/anaconda3/envs/falcon/lib/python3.10/site-packages/falcontune-0.1.0-py3.10.egg/falcontune/model/falcon/model.py", line 722, in forward
    mlp_output += attention_output
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

I have the same problem
@rmihaylov

commented

Running into the same issue. Any luck on this?

commented

export CUDA_VISIBLE_DEVICES=1 can work

commented

Thanks. Seems to be working.

But ran into a OutOfMemory issue post fine tuning

File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 336, in _save_to_state_dict
self.weight.data = undo_layout(self.state.CxB, self.state.tile_indices)
File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 96, in undo_layout
outputs = torch.empty_like(tensor) # note: not using .index_copy because it was slower on cuda
torch.cuda.OutOfMemoryError: CUDA out of memory.

Not sure what knob to adjust