artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs

Home Page:https://arxiv.org/abs/2305.14314

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Merge issue

qburst-fidha opened this issue · comments

I am trying to merge my adaptor with base model after finetuning using qlora.

Error

==================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/bitsandbytes-0.39.0-py3.11.egg/bitsandbytes/libbitsandbytes_cuda117.so
/home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/bitsandbytes-0.39.0-py3.11.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/ubuntu/miniconda3/envs/training_llama did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.7/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/bitsandbytes-0.39.0-py3.11.egg/bitsandbytes/libbitsandbytes_cuda117.so...
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [18:13<00:00, 37.72s/it]
/home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:381: UserWarning: do_sample is set to False. However, temperature is set to 0.9 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:386: UserWarning: do_sample is set to False. However, top_p is set to 0.6 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
Traceback (most recent call last):
File "/home/ubuntu/llma2/training/qlora/merge_v1.py", line 18, in
model = model.merge_and_unload()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/peft/tuners/lora/model.py", line 658, in merge_and_unload
return self._unload_and_optionally_merge(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/peft/tuners/lora/model.py", line 390, in _unload_and_optionally_merge
target.merge(safe_merge=safe_merge, adapter_names=adapter_names)
TypeError: Linear4bit.merge() got an unexpected keyword argument 'adapter_names'

This is my code:

model_id="./models/WizardLM_WizardLM-70B-V1.0"
adapter_id="./models/checkpoint-300/adapter_model/"

tokenizer = LlamaTokenizer.from_pretrained(model_id)
model = LlamaForCausalLM.from_pretrained(model_id, load_in_4bit=True, device_map='auto', torch_dtype=torch.float16)

model = PeftModel.from_pretrained(model, adapter_id)
model = model.merge_and_unload()
torch.save(model.state_dict(), "./final_model/model.bin")