oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Insufficient size of temp_dq buffer when loading model

RedCodedWizard opened this issue · comments

I am having an issue while trying to load a 7B model by TheBloke/Xwin-MLewd-7B-V0.2-GPTQ. although the 13B version model did load correctly and I had no issue with it, except that the response time is a little bit slow for my setup. but when I tried the 7B version I kept receiving this runtime error.
I have used both Loaders ExLlamav2_HF and AutoGPTQ

with the ExLlamav2_HF:
lowering the max_seq_len from 4096 to 2048 did not make a difference.

Traceback (most recent call last):

File "D:\OOBABOOGA\text-generation-webui-main\modules\ui_model_menu.py", line 244, in load_model_wrapper


shared.model, shared.tokenizer = load_model(selected_model, loader)

                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\models.py", line 93, in load_model


output = load_func_map[loader](model_name)

         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\models.py", line 325, in ExLlamav2_HF_loader


return Exllamav2HF.from_pretrained(model_name)

       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\exllamav2_hf.py", line 181, in from_pretrained


return Exllamav2HF(config)

       ^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\exllamav2_hf.py", line 50, in init


self.ex_model.load(split)
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\model.py", line 332, in load


for item in f: x = item
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\model.py", line 355, in load_gen


module.load()
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\attn.py", line 254, in load


self.q_proj.load()
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\linear.py", line 109, in load


self.q_handle = ext.make_q_matrix(w,

                ^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\ext.py", line 247, in make_q_matrix


return ext_c.make_q_matrix(w["qweight"],

       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Insufficient size of temp_dq buffer

with AutoGPTQ:
wbits: 4, groupsize: 128

Traceback (most recent call last):

File "D:\OOBABOOGA\text-generation-webui-main\modules\ui_model_menu.py", line 244, in load_model_wrapper


shared.model, shared.tokenizer = load_model(selected_model, loader)

                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\models.py", line 93, in load_model


output = load_func_map[loader](model_name)

         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\models.py", line 312, in AutoGPTQ_loader


return modules.AutoGPTQ_loader.load_quantized(model_name)

       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\AutoGPTQ_loader.py", line 59, in load_quantized


model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)

        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\auto_gptq\modeling\auto.py", line 135, in from_quantized


return quant_func(

       ^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\auto_gptq\modeling_base.py", line 1246, in from_quantized


accelerate.utils.modeling.load_checkpoint_in_model(
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\accelerate\utils\modeling.py", line 1736, in load_checkpoint_in_model


set_module_tensor_to_device(
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\accelerate\utils\modeling.py", line 358, in set_module_tensor_to_device


raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([32000, 5120]) in "weight" (which has shape torch.Size([32001, 4096])), this look incorrect.