Can't load CodeLlama-13b
user799595 opened this issue · comments
I would like to finetune CodeLlama-13b in a memory efficient way.
I was able to do it with CodeLlama-7b, but failing with 13b.
I can't load the model unsloth/codellama-13b-bnb-4bit
:
model, tokenizer = unsloth.FastLanguageModel.from_pretrained('codellama/CodeLlama-13b-hf', load_in_4bit=True)
ValueError: Supplied state dict for model.layers.28.mlp.gate_proj.weight does not contain
bitsandbytes__*
and possibly otherquantized_stats
components.
I tried to quantize it first, but that also failed
model, tokenizer = unsloth.FastLanguageModel.from_pretrained('codellama/CodeLlama-13b-hf', load_in_4bit=False)
model.save_pretrained_gguf('./codellama-13b-bnb-4bit', tokenizer=tokenizer)
RuntimeError: The weights trying to be saved contained shared tensors [{'model.layers.26.self_attn.q_proj.weight', 'model.layers.31.self_attn.v_proj.weight', 'model.layers.32.self_attn.q_proj.weight', 'model.layers.39.self_attn.q_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.35.self_attn.q_proj.weight', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.33.self_attn.q_proj.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.33.self_attn.k_proj.weight', 'model.layers.35.self_attn.k_proj.weight', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.36.self_attn.q_proj.weight', 'model.layers.36.self_attn.k_proj.weight', 'model.layers.37.self_attn.q_proj.weight', 'model.layers.39.self_attn.k_proj.weight', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.38.self_attn.k_proj.weight', 'model.layers.34.self_attn.q_proj.weight', 'model.layers.33.self_attn.v_proj.weight', 'model.layers.32.self_attn.o_proj.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.33.self_attn.o_proj.weight', 'model.layers.36.self_attn.v_proj.weight', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.32.self_attn.v_proj.weight', 'model.layers.34.self_attn.v_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.34.self_attn.o_proj.weight', 'model.layers.36.self_attn.o_proj.weight', 'model.layers.39.self_attn.v_proj.weight', 'model.layers.39.self_attn.o_proj.weight', 'model.layers.34.self_attn.k_proj.weight', 'model.layers.32.self_attn.k_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.35.self_attn.v_proj.weight', 'model.layers.37.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.37.self_attn.o_proj.weight', 'model.layers.37.self_attn.k_proj.weight', 'model.layers.35.self_attn.o_proj.weight', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.38.self_attn.q_proj.weight', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.38.self_attn.v_proj.weight', 'model.layers.38.self_attn.o_proj.weight'}, {'model.layers.37.mlp.gate_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.33.mlp.up_proj.weight', 'model.layers.35.mlp.gate_proj.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.35.mlp.down_proj.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.38.mlp.gate_proj.weight', 'model.layers.33.mlp.down_proj.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.36.mlp.down_proj.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.33.mlp.gate_proj.weight', 'model.layers.37.mlp.up_proj.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.37.mlp.down_proj.weight', 'model.layers.32.mlp.gate_proj.weight', 'model.layers.39.mlp.down_proj.weight', 'model.layers.34.mlp.down_proj.weight', 'model.layers.39.mlp.gate_proj.weight', 'model.layers.32.mlp.up_proj.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.36.mlp.up_proj.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.34.mlp.gate_proj.weight', 'model.layers.38.mlp.up_proj.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.36.mlp.gate_proj.weight', 'model.layers.38.mlp.down_proj.weight', 'model.layers.35.mlp.up_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.28.mlp.down_proj.weight', 'model.layers.32.mlp.down_proj.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.34.mlp.up_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.39.mlp.up_proj.weight'}, {'model.layers.37.input_layernorm.weight', 'model.layers.32.post_attention_layernorm.weight', 'model.layers.35.input_layernorm.weight', 'model.layers.35.post_attention_layernorm.weight', 'model.layers.31.input_layernorm.weight', 'model.layers.26.input_layernorm.weight', 'model.layers.36.input_layernorm.weight', 'model.layers.34.post_attention_layernorm.weight', 'model.layers.27.post_attention_layernorm.weight', 'model.layers.27.input_layernorm.weight', 'model.layers.37.post_attention_layernorm.weight', 'model.norm.weight', 'model.layers.28.post_attention_layernorm.weight', 'model.layers.38.post_attention_layernorm.weight', 'model.layers.34.input_layernorm.weight', 'model.layers.30.input_layernorm.weight', 'model.layers.38.input_layernorm.weight', 'model.layers.30.post_attention_layernorm.weight', 'model.layers.29.post_attention_layernorm.weight', 'model.layers.32.input_layernorm.weight', 'model.layers.28.input_layernorm.weight', 'model.layers.31.post_attention_layernorm.weight', 'model.layers.39.input_layernorm.weight', 'model.layers.33.input_layernorm.weight', 'model.layers.26.post_attention_layernorm.weight', 'model.layers.39.post_attention_layernorm.weight', 'model.layers.29.input_layernorm.weight', 'model.layers.36.post_attention_layernorm.weight', 'model.layers.33.post_attention_layernorm.weight'}] that are mismatching the transformers base configuration. Try saving using
safe_serialization=False
or remove this tensor sharing.
Is CodeLlama-13b not supported? Should I be using a different model?
Do you know if Colab functions fine with Codellama-13b? It should work
Sorry, I don't know about Colab.
Is unsloth compatible with AWS?
Oh I meant did you try via https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing - it has some free GPU times, so its good for experimentation. If our Colabs break, then there's something wrong
I have same problem with codellama-13b-bnb-4bit.
I went to Colab, switched the model name and got the same error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-3edea52bfdfc> in <cell line: 20>()
18 ] # More models at https://huggingface.co/unsloth
19
---> 20 model, tokenizer = FastLanguageModel.from_pretrained(
21 model_name = "unsloth/codellama-13b-bnb-4bit",
22 max_seq_length = max_seq_length,
6 frames
/usr/local/lib/python3.10/dist-packages/transformers/quantizers/quantizer_bnb_4bit.py in create_quantized_param(self, model, param_value, param_name, target_device, state_dict, unexpected_keys)
188 param_name + ".quant_state.bitsandbytes__nf4" not in state_dict
189 ):
--> 190 raise ValueError(
191 f"Supplied state dict for {param_name} does not contain `bitsandbytes__*` and possibly other `quantized_stats` components."
192 )
ValueError: Supplied state dict for model.layers.28.mlp.gate_proj.weight does not contain `bitsandbytes__*` and possibly other `quantized_stats` components.
Apologies just relocated to SF, hence the slowness!
Will investigate this!