[integration]: merging bfloat16 model failed

Question

[integration]: merging bfloat16 model failed

raj-ritu17 opened this issue a month ago · comments

base-model: Weyaxi/Dolphin2.1-OpenOrca-7B

Scenario:

followed the following guidelines - https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora#1-install
Fine-Tuning method alpaca-qlora (IPEX-LLM)
after FT when trying to merge the model have an torch data_type issue

(ft_Qlora) intel@imu-nex-sprx92-max1-sut:~/ritu/ipex-llm/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora$ python ./export_merged_model.py --repo-id-or-model-path Weyaxi/Dolphin2.1-OpenOrca-7B --adapter_path ./out-dir-FT/tmp-checkpoint-1400/ --output_path ./out-dir-FT/tmp-checkpoint-1400-merged
/home/intel/miniconda3/envs/ft_Qlora/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
2024-05-24 14:37:05,078 - INFO - intel_extension_for_pytorch auto imported
2024-05-24 14:37:05,084 - WARNING - The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
/home/intel/miniconda3/envs/ft_Qlora/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
2024-05-24 14:37:05,713 - ERROR -

****************************Usage Error************************
Please use torch_dtype=torch.bfloat16 when setting load_in_low_bit='bf16'.
2024-05-24 14:37:05,713 - ERROR -

****************************Call Stack*************************
Failed to merge the adapter, error: Please use torch_dtype=torch.bfloat16 when setting load_in_low_bit='bf16'..

what else tried:
added 'torch_dtype=torch.bfloat16' in utils code (in function -> merge_adapter) for e.g.
--> common/utils/util.py +183

    try:
        base_model = AutoModelForCausalLM.from_pretrained(
            base_model,
            #load_in_low_bit="nf4", # should load the orignal model
            #torch_dtype=torch.float16,
            #ritu: added for DolphinOrca-7b
            **torch_dtype=torch.bfloat16**,
            #end
            device_map={"": "cpu"},
        )

        lora_model = PeftModel.from_pretrained(
            base_model,
            adapter_path,
            device_map={"": "cpu"},
            #torch_dtype=torch.float16,
            **torch_dtype=torch.bfloat16,**
        )

this doesn't solve the issue and gives an empty error.

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 11.78it/s]
2024-05-24 10:35:22,564 - INFO - Converting the current model to bf16 format......
[2024-05-24 10:35:22,912] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to xpu (auto detect)
Failed to merge the adapter, error: .

binbin Deng · Answer 1 · Mon May 27 2024 14:57:19 GMT+0800 (China Standard Time)

Hi, @raj-ritu17 , I have reproduced error during merging model. We will try to fix it, update here once it is solved.

binbin Deng · Answer 2 · Tue May 28 2024 10:19:02 GMT+0800 (China Standard Time)

Hi, @raj-ritu17 .
We have fixed this bug. Please install the latest ipex-llm (2.1.0b20240527), no need to modify utils code and just run this script to merge model.

According to my local experiment, this merging process works and you could use this merged model do inference following https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral