Merge issue
qburst-fidha opened this issue · comments
I am trying to merge my adaptor with base model after finetuning using qlora.
Error
==================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/bitsandbytes-0.39.0-py3.11.egg/bitsandbytes/libbitsandbytes_cuda117.so
/home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/bitsandbytes-0.39.0-py3.11.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/ubuntu/miniconda3/envs/training_llama did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.7/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/bitsandbytes-0.39.0-py3.11.egg/bitsandbytes/libbitsandbytes_cuda117.so...
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [18:13<00:00, 37.72s/it]
/home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:381: UserWarning: do_sample
is set to False
. However, temperature
is set to 0.9
-- this flag is only used in sample-based generation modes. You should set do_sample=True
or unset temperature
. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:386: UserWarning: do_sample
is set to False
. However, top_p
is set to 0.6
-- this flag is only used in sample-based generation modes. You should set do_sample=True
or unset top_p
. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
Traceback (most recent call last):
File "/home/ubuntu/llma2/training/qlora/merge_v1.py", line 18, in
model = model.merge_and_unload()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/peft/tuners/lora/model.py", line 658, in merge_and_unload
return self._unload_and_optionally_merge(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/miniconda3/envs/training_llama/lib/python3.11/site-packages/peft/tuners/lora/model.py", line 390, in _unload_and_optionally_merge
target.merge(safe_merge=safe_merge, adapter_names=adapter_names)
TypeError: Linear4bit.merge() got an unexpected keyword argument 'adapter_names'
This is my code:
model_id="./models/WizardLM_WizardLM-70B-V1.0"
adapter_id="./models/checkpoint-300/adapter_model/"
tokenizer = LlamaTokenizer.from_pretrained(model_id)
model = LlamaForCausalLM.from_pretrained(model_id, load_in_4bit=True, device_map='auto', torch_dtype=torch.float16)
model = PeftModel.from_pretrained(model, adapter_id)
model = model.merge_and_unload()
torch.save(model.state_dict(), "./final_model/model.bin")