Some weights of the model checkpoint at `finetune_starcoder2/final_checkpoint were not used when initializing Starcoder2ForCausalLM

Question

Some weights of the model checkpoint at `finetune_starcoder2/final_checkpoint were not used when initializing Starcoder2ForCausalLM

noraise opened this issue 3 months ago · comments

I get the following error after finetuning this model on the R dataset following the example in the README.

Some weights of the model checkpoint at finetune_starcoder2/final_checkpoint were not used when initializing Starcoder2ForCausalLM: ['model.layers.0.self_attn.k_proj.base_layer.bias', 'model.layers.0.self_attn.k_proj.base_layer.weight', 'model.layers.0.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.k_proj.lora_A.default.weight', 'model.layers.0.self_attn.k_proj.lora_B.default.weight', 'model.layers.0.self_attn.o_proj.base_layer.bias', 'model.layers.0.self_attn.o_proj.base_layer.weight', 'model.layers.0.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.o_proj.lora_A.default.weight', 'model.layers.0.self_attn.o_proj.lora_B.default.weight', 'model.layers.0.self_attn.q_proj.base_layer.bias', 'model.layers.0.self_attn.q_proj.base_layer.weight', 'model.layers.0.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.q_proj.lora_A.default.weight', 'model.layers.0.self_attn.q_proj.lora_B.default.weight', 'model.layers.0.self_attn.v_proj.base_layer.bias', 'model.layers.0.self_attn.v_proj.base_layer.weight', 'model.layers.0.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.v_proj.lora_A.default.weight', 'model.layers.0.self_attn.v_proj.lora_B.default.weight', 'model.layers.1.self_attn.k_proj.base_layer.bias', 'model.layers.1.self_attn.k_proj.base_layer.weight', 'model.layers.1.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.k_proj.lora_A.default.weight', 'model.layers.1.self_attn.k_proj.lora_B.default.weight', 'model.layers.1.self_attn.o_proj.base_layer.bias', 'model.layers.1.self_attn.o_proj.base_layer.weight', 'model.layers.1.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.o_proj.lora_A.default.weight', 'model.layers.1.self_attn.o_proj.lora_B.default.weight', 'model.layers.1.self_attn.q_proj.base_layer.bias', 'model.layers.1.self_attn.q_proj.base_layer.weight', 'model.layers.1.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.q_proj.lora_A.default.weight', 'model.layers.1.self_attn.q_proj.lora_B.default.weight', 'model.layers.1.self_attn.v_proj.base_layer.bias', 'model.layers.1.self_attn.v_proj.base_layer.weight', 'model.layers.1.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.v_proj.lora_A.default.weight', 'model.layers.1.self_attn.v_proj.lora_B.default.weight', 'model.layers.10.self_attn.k_proj.base_layer.bias', 'model.layers.10.self_attn.k_proj.base_layer.weight', 'model.layers.10.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.k_proj.lora_A.default.weight', 'model.layers.10.self_attn.k_proj.lora_B.default.weight', 'model.layers.10.self_attn.o_proj.base_layer.bias', 'model.layers.10.self_attn.o_proj.base_layer.weight', 'model.layers.10.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.o_proj.lora_A.default.weight', 'model.layers.10.self_attn.o_proj.lora_B.default.weight', 'model.layers.10.self_attn.q_proj.base_layer.bias', 'model.layers.10.self_attn.q_proj.base_layer.weight', 'model.layers.10.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.q_proj.lora_A.default.weight', 'model.layers.10.self_attn.q_proj.lora_B.default.weight', 'model.layers.10.self_attn.v_proj.base_layer.bias', 'model.layers.10.self_attn.v_proj.base_layer.weight', 'model.layers.10.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.v_proj.lora_A.default.weight', 'model.layers.10.self_attn.v_proj.lora_B.default.weight', 'model.layers.11.self_attn.k_proj.base_layer.bias', 'model.layers.11.self_attn.k_proj.base_layer.weight', 'model.layers.11.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.k_proj.lora_A.default.weight', 'model.layers.11.self_attn.k_proj.lora_B.default.weight', 'model.layers.11.self_attn.o_proj.base_layer.bias', 'model.layers.11.self_attn.o_proj.base_layer.weight', 'model.layers.11.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.o_proj.lora_A.default.weight', 'model.layers.11.self_attn.o_proj.lora_B.default.weight', 'model.layers.11.self_attn.q_proj.base_layer.bias', 'model.layers.11.self_attn.q_proj.base_layer.weight', 'model.layers.11.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.q_proj.lora_A.default.weight', 'model.layers.11.self_attn.q_proj.lora_B.default.weight', 'model.layers.11.self_attn.v_proj.base_layer.bias', 'model.layers.11.self_attn.v_proj.base_layer.weight', 'model.layers.11.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.v_proj.lora_A.default.weight', 'model.layers.11.self_attn.v_proj.lora_B.default.weight', 'model.layers.12.self_attn.k_proj.base_layer.bias', 'model.layers.12.self_attn.k_proj.base_layer.weight', 'model.layers.12.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.k_proj.lora_A.default.weight', 'model.layers.12.self_attn.k_proj.lora_B.default.weight', 'model.layers.12.self_attn.o_proj.base_layer.bias', 'model.layers.12.self_attn.o_proj.base_layer.weight', 'model.layers.12.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.o_proj.lora_A.default.weight', 'model.layers.12.self_attn.o_proj.lora_B.default.weight', 'model.layers.12.self_attn.q_proj.base_layer.bias', 'model.layers.12.self_attn.q_proj.base_layer.weight', 'model.layers.12.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.q_proj.lora_A.default.weight', 'model.layers.12.self_attn.q_proj.lora_B.default.weight', 'model.layers.12.self_attn.v_proj.base_layer.bias', 'model.layers.12.self_attn.v_proj.base_layer.weight', 'model.layers.12.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.v_proj.lora_A.default.weight', 'model.layers.12.self_attn.v_proj.lora_B.default.weight', 'model.layers.13.self_attn.k_proj.base_layer.bias', 'model.layers.13.self_attn.k_proj.base_layer.weight', 'model.layers.13.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.k_proj.lora_A.default.weight', 'model.layers.13.self_attn.k_proj.lora_B.default.weight', 'model.layers.13.self_attn.o_proj.base_layer.bias', 'model.layers.13.self_attn.o_proj.base_layer.weight', 'model.layers.13.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.o_proj.lora_A.default.weight', 'model.layers.13.self_attn.o_proj.lora_B.default.weight', 'model.layers.13.self_attn.q_proj.base_layer.bias', 'model.layers.13.self_attn.q_proj.base_layer.weight', 'model.layers.13.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.q_proj.lora_A.default.weight', 'model.layers.13.self_attn.q_proj.lora_B.default.weight', 'model.layers.13.self_attn.v_proj.base_layer.bias', 'model.layers.13.self_attn.v_proj.base_layer.weight', 'model.layers.13.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.v_proj.lora_A.default.weight', 'model.layers.13.self_attn.v_proj.lora_B.default.weight', 'model.layers.14.self_attn.k_proj.base_layer.bias', 'model.layers.14.self_attn.k_proj.base_layer.weight', 'model.layers.14.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.k_proj.lora_A.default.weight', 'model.layers.14.self_attn.k_proj.lora_B.default.weight', 'model.layers.14.self_attn.o_proj.base_layer.bias', 'model.layers.14.self_attn.o_proj.base_layer.weight', 'model.layers.14.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.o_proj.lora_A.default.weight', 'model.layers.14.self_attn.o_proj.lora_B.default.weight', 'model.layers.14.self_attn.q_proj.base_layer.bias', 'model.layers.14.self_attn.q_proj.base_layer.weight', 'model.layers.14.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.q_proj.lora_A.default.weight', 'model.layers.14.self_attn.q_proj.lora_B.default.weight', 'model.layers.14.self_attn.v_proj.base_layer.bias', 'model.layers.14.self_attn.v_proj.base_layer.weight', 'model.layers.14.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.v_proj.lora_A.default.weight', 'model.layers.14.self_attn.v_proj.lora_B.default.weight', 'model.layers.15.self_attn.k_proj.base_layer.bias', 'model.layers.15.self_attn.k_proj.base_layer.weight', 'model.layers.15.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.k_proj.lora_A.default.weight', 'model.layers.15.self_attn.k_proj.lora_B.default.weight', 'model.layers.15.self_attn.o_proj.base_layer.bias', 'model.layers.15.self_attn.o_proj.base_layer.weight', 'model.layers.15.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.o_proj.lora_A.default.weight', 'model.layers.15.self_attn.o_proj.lora_B.default.weight', 'model.layers.15.self_attn.q_proj.base_layer.bias', 'model.layers.15.self_attn.q_proj.base_layer.weight', 'model.layers.15.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.q_proj.lora_A.default.weight', 'model.layers.15.self_attn.q_proj.lora_B.default.weight', 'model.layers.15.self_attn.v_proj.base_layer.bias', 'model.layers.15.self_attn.v_proj.base_layer.weight', 'model.layers.15.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.v_proj.lora_A.default.weight', 'model.layers.15.self_attn.v_proj.lora_B.default.weight', 'model.layers.16.self_attn.k_proj.base_layer.bias', 'model.layers.16.self_attn.k_proj.base_layer.weight', 'model.layers.16.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.k_proj.lora_A.default.weight', 'model.layers.16.self_attn.k_proj.lora_B.default.weight', 'model.layers.16.self_attn.o_proj.base_layer.bias', 'model.layers.16.self_attn.o_proj.base_layer.weight', 'model.layers.16.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.o_proj.lora_A.default.weight', 'model.layers.16.self_attn.o_proj.lora_B.default.weight', 'model.layers.16.self_attn.q_proj.base_layer.bias', 'model.layers.16.self_attn.q_proj.base_layer.weight', 'model.layers.16.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.q_proj.lora_A.default.weight', 'model.layers.16.self_attn.q_proj.lora_B.default.weight', 'model.layers.16.self_attn.v_proj.base_layer.bias', 'model.layers.16.self_attn.v_proj.base_layer.weight', 'model.layers.16.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.v_proj.lora_A.default.weight', 'model.layers.16.self_attn.v_proj.lora_B.default.weight', 'model.layers.17.self_attn.k_proj.base_layer.bias', 'model.layers.17.self_attn.k_proj.base_layer.weight', 'model.layers.17.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.k_proj.lora_A.default.weight', 'model.layers.17.self_attn.k_proj.lora_B.default.weight', 'model.layers.17.self_attn.o_proj.base_layer.bias', 'model.layers.17.self_attn.o_proj.base_layer.weight', 'model.layers.17.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.o_proj.lora_A.default.weight', 'model.layers.17.self_attn.o_proj.lora_B.default.weight', 'model.layers.17.self_attn.q_proj.base_layer.bias', 'model.layers.17.self_attn.q_proj.base_layer.weight', 'model.layers.17.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.q_proj.lora_A.default.weight', 'model.layers.17.self_attn.q_proj.lora_B.default.weight', 'model.layers.17.self_attn.v_proj.base_layer.bias', 'model.layers.17.self_attn.v_proj.base_layer.weight', 'model.layers.17.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.v_proj.lora_A.default.weight', 'model.layers.17.self_attn.v_proj.lora_B.default.weight', 'model.layers.18.self_attn.k_proj.base_layer.bias', 'model.layers.18.self_attn.k_proj.base_layer.weight', 'model.layers.18.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.k_proj.lora_A.default.weight', 'model.layers.18.self_attn.k_proj.lora_B.default.weight', 'model.layers.18.self_attn.o_proj.base_layer.bias', 'model.layers.18.self_attn.o_proj.base_layer.weight', 'model.layers.18.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.o_proj.lora_A.default.weight', 'model.layers.18.self_attn.o_proj.lora_B.default.weight', 'model.layers.18.self_attn.q_proj.base_layer.bias', 'model.layers.18.self_attn.q_proj.base_layer.weight', 'model.layers.18.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.q_proj.lora_A.default.weight', 'model.layers.18.self_attn.q_proj.lora_B.default.weight', 'model.layers.18.self_attn.v_proj.base_layer.bias', 'model.layers.18.self_attn.v_proj.base_layer.weight', 'model.layers.18.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.v_proj.lora_A.default.weight', 'model.layers.18.self_attn.v_proj.lora_B.default.weight', 'model.layers.19.self_attn.k_proj.base_layer.bias', 'model.layers.19.self_attn.k_proj.base_layer.weight', 'model.layers.19.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.k_proj.lora_A.default.weight', 'model.layers.19.self_attn.k_proj.lora_B.default.weight', 'model.layers.19.self_attn.o_proj.base_layer.bias', 'model.layers.19.self_attn.o_proj.base_layer.weight', 'model.layers.19.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.o_proj.lora_A.default.weight', 'model.layers.19.self_attn.o_proj.lora_B.default.weight', 'model.layers.19.self_attn.q_proj.base_layer.bias', 'model.layers.19.self_attn.q_proj.base_layer.weight', 'model.layers.19.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.q_proj.lora_A.default.weight', 'model.layers.19.self_attn.q_proj.lora_B.default.weight', 'model.layers.19.self_attn.v_proj.base_layer.bias', 'model.layers.19.self_attn.v_proj.base_layer.weight', 'model.layers.19.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.v_proj.lora_A.default.weight', 'model.layers.19.self_attn.v_proj.lora_B.default.weight', 'model.layers.2.self_attn.k_proj.base_layer.bias', 'model.layers.2.self_attn.k_proj.base_layer.weight', 'model.layers.2.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.k_proj.lora_A.default.weight', 'model.layers.2.self_attn.k_proj.lora_B.default.weight', 'model.layers.2.self_attn.o_proj.base_layer.bias', 'model.layers.2.self_attn.o_proj.base_layer.weight', 'model.layers.2.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.o_proj.lora_A.default.weight', 'model.layers.2.self_attn.o_proj.lora_B.default.weight', 'model.layers.2.self_attn.q_proj.base_layer.bias', 'model.layers.2.self_attn.q_proj.base_layer.weight', 'model.layers.2.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.q_proj.lora_A.default.weight', 'model.layers.2.self_attn.q_proj.lora_B.default.weight', 'model.layers.2.self_attn.v_proj.base_layer.bias', 'model.layers.2.self_attn.v_proj.base_layer.weight', 'model.layers.2.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.v_proj.lora_A.default.weight', 'model.layers.2.self_attn.v_proj.lora_B.default.weight', 'model.layers.20.self_attn.k_proj.base_layer.bias', 'model.layers.20.self_attn.k_proj.base_layer.weight', 'model.layers.20.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.k_proj.lora_A.default.weight', 'model.layers.20.self_attn.k_proj.lora_B.default.weight', 'model.layers.20.self_attn.o_proj.base_layer.bias', 'model.layers.20.self_attn.o_proj.base_layer.weight', 'model.layers.20.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.o_proj.lora_A.default.weight', 'model.layers.20.self_attn.o_proj.lora_B.default.weight', 'model.layers.20.self_attn.q_proj.base_layer.bias', 'model.layers.20.self_attn.q_proj.base_layer.weight', 'model.layers.20.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.q_proj.lora_A.default.weight', 'model.layers.20.self_attn.q_proj.lora_B.default.weight', 'model.layers.20.self_attn.v_proj.base_layer.bias', 'model.layers.20.self_attn.v_proj.base_layer.weight', 'model.layers.20.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.v_proj.lora_A.default.weight', 'model.layers.20.self_attn.v_proj.lora_B.default.weight', 'model.layers.21.self_attn.k_proj.base_layer.bias', 'model.layers.21.self_attn.k_proj.base_layer.weight', 'model.layers.21.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.k_proj.lora_A.default.weight', 'model.layers.21.self_attn.k_proj.lora_B.default.weight', 'model.layers.21.self_attn.o_proj.base_layer.bias', 'model.layers.21.self_attn.o_proj.base_layer.weight', 'model.layers.21.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.o_proj.lora_A.default.weight', 'model.layers.21.self_attn.o_proj.lora_B.default.weight', 'model.layers.21.self_attn.q_proj.base_layer.bias', 'model.layers.21.self_attn.q_proj.base_layer.weight', 'model.layers.21.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.q_proj.lora_A.default.weight', 'model.layers.21.self_attn.q_proj.lora_B.default.weight', 'model.layers.21.self_attn.v_proj.base_layer.bias', 'model.layers.21.self_attn.v_proj.base_layer.weight', 'model.layers.21.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.v_proj.lora_A.default.weight', 'model.layers.21.self_attn.v_proj.lora_B.default.weight', 'model.layers.22.self_attn.k_proj.base_layer.bias', 'model.layers.22.self_attn.k_proj.base_layer.weight', 'model.layers.22.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.k_proj.lora_A.default.weight', 'model.layers.22.self_attn.k_proj.lora_B.default.weight', 'model.layers.22.self_attn.o_proj.base_layer.bias', 'model.layers.22.self_attn.o_proj.base_layer.weight', 'model.layers.22.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.o_proj.lora_A.default.weight', 'model.layers.22.self_attn.o_proj.lora_B.default.weight', 'model.layers.22.self_attn.q_proj.base_layer.bias', 'model.layers.22.self_attn.q_proj.base_layer.weight', 'model.layers.22.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.q_proj.lora_A.default.weight', 'model.layers.22.self_attn.q_proj.lora_B.default.weight', 'model.layers.22.self_attn.v_proj.base_layer.bias', 'model.layers.22.self_attn.v_proj.base_layer.weight', 'model.layers.22.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.v_proj.lora_A.default.weight', 'model.layers.22.self_attn.v_proj.lora_B.default.weight', 'model.layers.23.self_attn.k_proj.base_layer.bias', 'model.layers.23.self_attn.k_proj.base_layer.weight', 'model.layers.23.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.k_proj.lora_A.default.weight', 'model.layers.23.self_attn.k_proj.lora_B.default.weight', 'model.layers.23.self_attn.o_proj.base_layer.bias', 'model.layers.23.self_attn.o_proj.base_layer.weight', 'model.layers.23.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.o_proj.lora_A.default.weight', 'model.layers.23.self_attn.o_proj.lora_B.default.weight', 'model.layers.23.self_attn.q_proj.base_layer.bias', 'model.layers.23.self_attn.q_proj.base_layer.weight', 'model.layers.23.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.q_proj.lora_A.default.weight', 'model.layers.23.self_attn.q_proj.lora_B.default.weight', 'model.layers.23.self_attn.v_proj.base_layer.bias', 'model.layers.23.self_attn.v_proj.base_layer.weight', 'model.layers.23.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.v_proj.lora_A.default.weight', 'model.layers.23.self_attn.v_proj.lora_B.default.weight', 'model.layers.24.self_attn.k_proj.base_layer.bias', 'model.layers.24.self_attn.k_proj.base_layer.weight', 'model.layers.24.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.k_proj.lora_A.default.weight', 'model.layers.24.self_attn.k_proj.lora_B.default.weight', 'model.layers.24.self_attn.o_proj.base_layer.bias', 'model.layers.24.self_attn.o_proj.base_layer.weight', 'model.layers.24.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.o_proj.lora_A.default.weight', 'model.layers.24.self_attn.o_proj.lora_B.default.weight', 'model.layers.24.self_attn.q_proj.base_layer.bias', 'model.layers.24.self_attn.q_proj.base_layer.weight', 'model.layers.24.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.q_proj.lora_A.default.weight', 'model.layers.24.self_attn.q_proj.lora_B.default.weight', 'model.layers.24.self_attn.v_proj.base_layer.bias', 'model.layers.24.self_attn.v_proj.base_layer.weight', 'model.layers.24.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.v_proj.lora_A.default.weight', 'model.layers.24.self_attn.v_proj.lora_B.default.weight', 'model.layers.25.self_attn.k_proj.base_layer.bias', 'model.layers.25.self_attn.k_proj.base_layer.weight', 'model.layers.25.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.k_proj.lora_A.default.weight', 'model.layers.25.self_attn.k_proj.lora_B.default.weight', 'model.layers.25.self_attn.o_proj.base_layer.bias', 'model.layers.25.self_attn.o_proj.base_layer.weight', 'model.layers.25.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.o_proj.lora_A.default.weight', 'model.layers.25.self_attn.o_proj.lora_B.default.weight', 'model.layers.25.self_attn.q_proj.base_layer.bias', 'model.layers.25.self_attn.q_proj.base_layer.weight', 'model.layers.25.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.q_proj.lora_A.default.weight', 'model.layers.25.self_attn.q_proj.lora_B.default.weight', 'model.layers.25.self_attn.v_proj.base_layer.bias', 'model.layers.25.self_attn.v_proj.base_layer.weight', 'model.layers.25.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.v_proj.lora_A.default.weight', 'model.layers.25.self_attn.v_proj.lora_B.default.weight', 'model.layers.26.self_attn.k_proj.base_layer.bias', 'model.layers.26.self_attn.k_proj.base_layer.weight', 'model.layers.26.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.k_proj.lora_A.default.weight', 'model.layers.26.self_attn.k_proj.lora_B.default.weight', 'model.layers.26.self_attn.o_proj.base_layer.bias', 'model.layers.26.self_attn.o_proj.base_layer.weight', 'model.layers.26.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.o_proj.lora_A.default.weight', 'model.layers.26.self_attn.o_proj.lora_B.default.weight', 'model.layers.26.self_attn.q_proj.base_layer.bias', 'model.layers.26.self_attn.q_proj.base_layer.weight', 'model.layers.26.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.q_proj.lora_A.default.weight', 'model.layers.26.self_attn.q_proj.lora_B.default.weight', 'model.layers.26.self_attn.v_proj.base_layer.bias', 'model.layers.26.self_attn.v_proj.base_layer.weight', 'model.layers.26.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.v_proj.lora_A.default.weight', 'model.layers.26.self_attn.v_proj.lora_B.default.weight', 'model.layers.27.self_attn.k_proj.base_layer.bias', 'model.layers.27.self_attn.k_proj.base_layer.weight', 'model.layers.27.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.k_proj.lora_A.default.weight', 'model.layers.27.self_attn.k_proj.lora_B.default.weight', 'model.layers.27.self_attn.o_proj.base_layer.bias', 'model.layers.27.self_attn.o_proj.base_layer.weight', 'model.layers.27.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.o_proj.lora_A.default.weight', 'model.layers.27.self_attn.o_proj.lora_B.default.weight', 'model.layers.27.self_attn.q_proj.base_layer.bias', 'model.layers.27.self_attn.q_proj.base_layer.weight', 'model.layers.27.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.q_proj.lora_A.default.weight', 'model.layers.27.self_attn.q_proj.lora_B.default.weight', 'model.layers.27.self_attn.v_proj.base_layer.bias', 'model.layers.27.self_attn.v_proj.base_layer.weight', 'model.layers.27.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.v_proj.lora_A.default.weight', 'model.layers.27.self_attn.v_proj.lora_B.default.weight', 'model.layers.28.self_attn.k_proj.base_layer.bias', 'model.layers.28.self_attn.k_proj.base_layer.weight', 'model.layers.28.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.k_proj.lora_A.default.weight', 'model.layers.28.self_attn.k_proj.lora_B.default.weight', 'model.layers.28.self_attn.o_proj.base_layer.bias', 'model.layers.28.self_attn.o_proj.base_layer.weight', 'model.layers.28.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.o_proj.lora_A.default.weight', 'model.layers.28.self_attn.o_proj.lora_B.default.weight', 'model.layers.28.self_attn.q_proj.base_layer.bias', 'model.layers.28.self_attn.q_proj.base_layer.weight', 'model.layers.28.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.q_proj.lora_A.default.weight', 'model.layers.28.self_attn.q_proj.lora_B.default.weight', 'model.layers.28.self_attn.v_proj.base_layer.bias', 'model.layers.28.self_attn.v_proj.base_layer.weight', 'model.layers.28.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.v_proj.lora_A.default.weight', 'model.layers.28.self_attn.v_proj.lora_B.default.weight', 'model.layers.29.self_attn.k_proj.base_layer.bias', 'model.layers.29.self_attn.k_proj.base_layer.weight', 'model.layers.29.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.k_proj.lora_A.default.weight', 'model.layers.29.self_attn.k_proj.lora_B.default.weight', 'model.layers.29.self_attn.o_proj.base_layer.bias', 'model.layers.29.self_attn.o_proj.base_layer.weight', 'model.layers.29.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.o_proj.lora_A.default.weight', 'model.layers.29.self_attn.o_proj.lora_B.default.weight', 'model.layers.29.self_attn.q_proj.base_layer.bias', 'model.layers.29.self_attn.q_proj.base_layer.weight', 'model.layers.29.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.q_proj.lora_A.default.weight', 'model.layers.29.self_attn.q_proj.lora_B.default.weight', 'model.layers.29.self_attn.v_proj.base_layer.bias', 'model.layers.29.self_attn.v_proj.base_layer.weight', 'model.layers.29.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.v_proj.lora_A.default.weight', 'model.layers.29.self_attn.v_proj.lora_B.default.weight', 'model.layers.3.self_attn.k_proj.base_layer.bias', 'model.layers.3.self_attn.k_proj.base_layer.weight', 'model.layers.3.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.k_proj.lora_A.default.weight', 'model.layers.3.self_attn.k_proj.lora_B.default.weight', 'model.layers.3.self_attn.o_proj.base_layer.bias', 'model.layers.3.self_attn.o_proj.base_layer.weight', 'model.layers.3.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.o_proj.lora_A.default.weight', 'model.layers.3.self_attn.o_proj.lora_B.default.weight', 'model.layers.3.self_attn.q_proj.base_layer.bias', 'model.layers.3.self_attn.q_proj.base_layer.weight', 'model.layers.3.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.q_proj.lora_A.default.weight', 'model.layers.3.self_attn.q_proj.lora_B.default.weight', 'model.layers.3.self_attn.v_proj.base_layer.bias', 'model.layers.3.self_attn.v_proj.base_layer.weight', 'model.layers.3.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.v_proj.lora_A.default.weight', 'model.layers.3.self_attn.v_proj.lora_B.default.weight', 'model.layers.4.self_attn.k_proj.base_layer.bias', 'model.layers.4.self_attn.k_proj.base_layer.weight', 'model.layers.4.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.k_proj.lora_A.default.weight', 'model.layers.4.self_attn.k_proj.lora_B.default.weight', 'model.layers.4.self_attn.o_proj.base_layer.bias', 'model.layers.4.self_attn.o_proj.base_layer.weight', 'model.layers.4.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.o_proj.lora_A.default.weight', 'model.layers.4.self_attn.o_proj.lora_B.default.weight', 'model.layers.4.self_attn.q_proj.base_layer.bias', 'model.layers.4.self_attn.q_proj.base_layer.weight', 'model.layers.4.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.q_proj.lora_A.default.weight', 'model.layers.4.self_attn.q_proj.lora_B.default.weight', 'model.layers.4.self_attn.v_proj.base_layer.bias', 'model.layers.4.self_attn.v_proj.base_layer.weight', 'model.layers.4.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.v_proj.lora_A.default.weight', 'model.layers.4.self_attn.v_proj.lora_B.default.weight', 'model.layers.5.self_attn.k_proj.base_layer.bias', 'model.layers.5.self_attn.k_proj.base_layer.weight', 'model.layers.5.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.k_proj.lora_A.default.weight', 'model.layers.5.self_attn.k_proj.lora_B.default.weight', 'model.layers.5.self_attn.o_proj.base_layer.bias', 'model.layers.5.self_attn.o_proj.base_layer.weight', 'model.layers.5.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.o_proj.lora_A.default.weight', 'model.layers.5.self_attn.o_proj.lora_B.default.weight', 'model.layers.5.self_attn.q_proj.base_layer.bias', 'model.layers.5.self_attn.q_proj.base_layer.weight', 'model.layers.5.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.q_proj.lora_A.default.weight', 'model.layers.5.self_attn.q_proj.lora_B.default.weight', 'model.layers.5.self_attn.v_proj.base_layer.bias', 'model.layers.5.self_attn.v_proj.base_layer.weight', 'model.layers.5.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.v_proj.lora_A.default.weight', 'model.layers.5.self_attn.v_proj.lora_B.default.weight', 'model.layers.6.self_attn.k_proj.base_layer.bias', 'model.layers.6.self_attn.k_proj.base_layer.weight', 'model.layers.6.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.k_proj.lora_A.default.weight', 'model.layers.6.self_attn.k_proj.lora_B.default.weight', 'model.layers.6.self_attn.o_proj.base_layer.bias', 'model.layers.6.self_attn.o_proj.base_layer.weight', 'model.layers.6.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.o_proj.lora_A.default.weight', 'model.layers.6.self_attn.o_proj.lora_B.default.weight', 'model.layers.6.self_attn.q_proj.base_layer.bias', 'model.layers.6.self_attn.q_proj.base_layer.weight', 'model.layers.6.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.q_proj.lora_A.default.weight', 'model.layers.6.self_attn.q_proj.lora_B.default.weight', 'model.layers.6.self_attn.v_proj.base_layer.bias', 'model.layers.6.self_attn.v_proj.base_layer.weight', 'model.layers.6.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.v_proj.lora_A.default.weight', 'model.layers.6.self_attn.v_proj.lora_B.default.weight', 'model.layers.7.self_attn.k_proj.base_layer.bias', 'model.layers.7.self_attn.k_proj.base_layer.weight', 'model.layers.7.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.k_proj.lora_A.default.weight', 'model.layers.7.self_attn.k_proj.lora_B.default.weight', 'model.layers.7.self_attn.o_proj.base_layer.bias', 'model.layers.7.self_attn.o_proj.base_layer.weight', 'model.layers.7.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.o_proj.lora_A.default.weight', 'model.layers.7.self_attn.o_proj.lora_B.default.weight', 'model.layers.7.self_attn.q_proj.base_layer.bias', 'model.layers.7.self_attn.q_proj.base_layer.weight', 'model.layers.7.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.q_proj.lora_A.default.weight', 'model.layers.7.self_attn.q_proj.lora_B.default.weight', 'model.layers.7.self_attn.v_proj.base_layer.bias', 'model.layers.7.self_attn.v_proj.base_layer.weight', 'model.layers.7.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.v_proj.lora_A.default.weight', 'model.layers.7.self_attn.v_proj.lora_B.default.weight', 'model.layers.8.self_attn.k_proj.base_layer.bias', 'model.layers.8.self_attn.k_proj.base_layer.weight', 'model.layers.8.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.k_proj.lora_A.default.weight', 'model.layers.8.self_attn.k_proj.lora_B.default.weight', 'model.layers.8.self_attn.o_proj.base_layer.bias', 'model.layers.8.self_attn.o_proj.base_layer.weight', 'model.layers.8.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.o_proj.lora_A.default.weight', 'model.layers.8.self_attn.o_proj.lora_B.default.weight', 'model.layers.8.self_attn.q_proj.base_layer.bias', 'model.layers.8.self_attn.q_proj.base_layer.weight', 'model.layers.8.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.q_proj.lora_A.default.weight', 'model.layers.8.self_attn.q_proj.lora_B.default.weight', 'model.layers.8.self_attn.v_proj.base_layer.bias', 'model.layers.8.self_attn.v_proj.base_layer.weight', 'model.layers.8.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.v_proj.lora_A.default.weight', 'model.layers.8.self_attn.v_proj.lora_B.default.weight', 'model.layers.9.self_attn.k_proj.base_layer.bias', 'model.layers.9.self_attn.k_proj.base_layer.weight', 'model.layers.9.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.k_proj.lora_A.default.weight', 'model.layers.9.self_attn.k_proj.lora_B.default.weight', 'model.layers.9.self_attn.o_proj.base_layer.bias', 'model.layers.9.self_attn.o_proj.base_layer.weight', 'model.layers.9.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.o_proj.lora_A.default.weight', 'model.layers.9.self_attn.o_proj.lora_B.default.weight', 'model.layers.9.self_attn.q_proj.base_layer.bias', 'model.layers.9.self_attn.q_proj.base_layer.weight', 'model.layers.9.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.q_proj.lora_A.default.weight', 'model.layers.9.self_attn.q_proj.lora_B.default.weight', 'model.layers.9.self_attn.v_proj.base_layer.bias', 'model.layers.9.self_attn.v_proj.base_layer.weight', 'model.layers.9.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.v_proj.lora_A.default.weight', 'model.layers.9.self_attn.v_proj.lora_B.default.weight']
- This IS expected if you are initializing Starcoder2ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Starcoder2ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Starcoder2ForCausalLM were not initialized from the model checkpoint at finetune_starcoder2/final_checkpoint and are newly initialized: ['model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.10.self_attn.q_proj.bias', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.10.self_attn.v_proj.bias', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.11.self_attn.k_proj.bias', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.11.self_attn.o_proj.bias', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.11.self_attn.q_proj.bias', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.11.self_attn.v_proj.bias', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.12.self_attn.k_proj.bias', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.12.self_attn.o_proj.bias', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.12.self_attn.q_proj.bias', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.12.self_attn.v_proj.bias', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.13.self_attn.k_proj.bias', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.13.self_attn.o_proj.bias', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.13.self_attn.q_proj.bias', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.13.self_attn.v_proj.bias', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.14.self_attn.k_proj.bias', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.14.self_attn.o_proj.bias', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.14.self_attn.q_proj.bias', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.14.self_attn.v_proj.bias', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.15.self_attn.k_proj.bias', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.15.self_attn.o_proj.bias', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.15.self_attn.q_proj.bias', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.15.self_attn.v_proj.bias', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.16.self_attn.k_proj.bias', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.16.self_attn.o_proj.bias', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.16.self_attn.q_proj.bias', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.16.self_attn.v_proj.bias', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.17.self_attn.k_proj.bias', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.17.self_attn.o_proj.bias', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.17.self_attn.q_proj.bias', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.17.self_attn.v_proj.bias', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.18.self_attn.k_proj.bias', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.18.self_attn.o_proj.bias', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.18.self_attn.q_proj.bias', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.18.self_attn.v_proj.bias', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.19.self_attn.k_proj.bias', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.19.self_attn.o_proj.bias', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.19.self_attn.q_proj.bias', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.19.self_attn.v_proj.bias', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.2.self_attn.k_proj.bias', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.2.self_attn.o_proj.bias', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.2.self_attn.q_proj.bias', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.2.self_attn.v_proj.bias', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.20.self_attn.k_proj.bias', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.20.self_attn.o_proj.bias', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.20.self_attn.q_proj.bias', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.20.self_attn.v_proj.bias', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.21.self_attn.k_proj.bias', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.21.self_attn.o_proj.bias', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.bias', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.21.self_attn.v_proj.bias', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.22.self_attn.k_proj.bias', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.22.self_attn.o_proj.bias', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.22.self_attn.q_proj.bias', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.self_attn.v_proj.bias', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.23.self_attn.k_proj.bias', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.23.self_attn.o_proj.bias', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.23.self_attn.q_proj.bias', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.bias', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.24.self_attn.k_proj.bias', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.bias', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.24.self_attn.q_proj.bias', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.24.self_attn.v_proj.bias', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.25.self_attn.k_proj.bias', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.25.self_attn.o_proj.bias', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.25.self_attn.q_proj.bias', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.25.self_attn.v_proj.bias', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.26.self_attn.k_proj.bias', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.26.self_attn.o_proj.bias', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.26.self_attn.q_proj.bias', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.26.self_attn.v_proj.bias', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.bias', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.27.self_attn.o_proj.bias', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.27.self_attn.q_proj.bias', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.27.self_attn.v_proj.bias', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.28.self_attn.k_proj.bias', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.28.self_attn.o_proj.bias', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.28.self_attn.q_proj.bias', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.28.self_attn.v_proj.bias', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.bias', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.29.self_attn.o_proj.bias', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.29.self_attn.q_proj.bias', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.29.self_attn.v_proj.bias', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.3.self_attn.k_proj.bias', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.3.self_attn.o_proj.bias', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.3.self_attn.q_proj.bias', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.bias', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.4.self_attn.k_proj.bias', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.4.self_attn.o_proj.bias', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.4.self_attn.q_proj.bias', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.4.self_attn.v_proj.bias', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.5.self_attn.k_proj.bias', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.5.self_attn.o_proj.bias', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.5.self_attn.q_proj.bias', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.5.self_attn.v_proj.bias', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.6.self_attn.k_proj.bias', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.6.self_attn.o_proj.bias', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.6.self_attn.q_proj.bias', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.6.self_attn.v_proj.bias', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.7.self_attn.k_proj.bias', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.7.self_attn.o_proj.bias', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.7.self_attn.q_proj.bias', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.7.self_attn.v_proj.bias', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.8.self_attn.k_proj.bias', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.8.self_attn.o_proj.bias', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.8.self_attn.q_proj.bias', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.8.self_attn.v_proj.bias', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.9.self_attn.k_proj.bias', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.9.self_attn.o_proj.bias', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.9.self_attn.q_proj.bias', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.bias', 'model.layers.9.self_attn.v_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# to use 4bit use `load_in_4bit=True` instead
quantization_config = BitsAndBytesConfig()

checkpoint = "bigcode/starcoder2-3b"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained("finetune_starcoder2/final_checkpoint", quantization_config=quantization_config)

inputs = tokenizer.encode("hello_world_function <- function() {", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Also, I don't think doing 4-bit quantization as a default for finetuning is a good idea. It should be opt-in with a flag.

I am also wondering why do we use the Stack v1 for finetuning and not the Stack v2?

Julio Sánchez · Answer 1 · Tue Mar 19 2024 00:12:35 GMT+0800 (China Standard Time)

I'm having the same problem. Is this the correct way to load the fine-tuned model? Is there no need to merge the lora adapter?

Julio Sánchez · Answer 2 · Wed Mar 20 2024 01:12:12 GMT+0800 (China Standard Time)

I get the following error after finetuning this model on the R dataset following the example in the README.

Some weights of the model checkpoint at finetune_starcoder2/final_checkpoint were not used when initializing Starcoder2ForCausalLM: ['model.layers.0.self_attn.k_proj.base_layer.bias', 'model.layers.0.self_attn.k_proj.base_layer.weight', 'model.layers.0.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.k_proj.lora_A.default.weight', 'model.layers.0.self_attn.k_proj.lora_B.default.weight', 'model.layers.0.self_attn.o_proj.base_layer.bias', 'model.layers.0.self_attn.o_proj.base_layer.weight', 'model.layers.0.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.o_proj.lora_A.default.weight', 'model.layers.0.self_attn.o_proj.lora_B.default.weight', 'model.layers.0.self_attn.q_proj.base_layer.bias', 'model.layers.0.self_attn.q_proj.base_layer.weight', 'model.layers.0.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.q_proj.lora_A.default.weight', 'model.layers.0.self_attn.q_proj.lora_B.default.weight', 'model.layers.0.self_attn.v_proj.base_layer.bias', 'model.layers.0.self_attn.v_proj.base_layer.weight', 'model.layers.0.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.v_proj.lora_A.default.weight', 'model.layers.0.self_attn.v_proj.lora_B.default.weight', 'model.layers.1.self_attn.k_proj.base_layer.bias', 'model.layers.1.self_attn.k_proj.base_layer.weight', 'model.layers.1.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.k_proj.lora_A.default.weight', 'model.layers.1.self_attn.k_proj.lora_B.default.weight', 'model.layers.1.self_attn.o_proj.base_layer.bias', 'model.layers.1.self_attn.o_proj.base_layer.weight', 'model.layers.1.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.o_proj.lora_A.default.weight', 'model.layers.1.self_attn.o_proj.lora_B.default.weight', 'model.layers.1.self_attn.q_proj.base_layer.bias', 'model.layers.1.self_attn.q_proj.base_layer.weight', 'model.layers.1.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.q_proj.lora_A.default.weight', 'model.layers.1.self_attn.q_proj.lora_B.default.weight', 'model.layers.1.self_attn.v_proj.base_layer.bias', 'model.layers.1.self_attn.v_proj.base_layer.weight', 'model.layers.1.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.v_proj.lora_A.default.weight', 'model.layers.1.self_attn.v_proj.lora_B.default.weight', 'model.layers.10.self_attn.k_proj.base_layer.bias', 'model.layers.10.self_attn.k_proj.base_layer.weight', 'model.layers.10.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.k_proj.lora_A.default.weight', 'model.layers.10.self_attn.k_proj.lora_B.default.weight', 'model.layers.10.self_attn.o_proj.base_layer.bias', 'model.layers.10.self_attn.o_proj.base_layer.weight', 'model.layers.10.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.o_proj.lora_A.default.weight', 'model.layers.10.self_attn.o_proj.lora_B.default.weight', 'model.layers.10.self_attn.q_proj.base_layer.bias', 'model.layers.10.self_attn.q_proj.base_layer.weight', 'model.layers.10.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.q_proj.lora_A.default.weight', 'model.layers.10.self_attn.q_proj.lora_B.default.weight', 'model.layers.10.self_attn.v_proj.base_layer.bias', 'model.layers.10.self_attn.v_proj.base_layer.weight', 'model.layers.10.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.v_proj.lora_A.default.weight', 'model.layers.10.self_attn.v_proj.lora_B.default.weight', 'model.layers.11.self_attn.k_proj.base_layer.bias', 'model.layers.11.self_attn.k_proj.base_layer.weight', 'model.layers.11.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.k_proj.lora_A.default.weight', 'model.layers.11.self_attn.k_proj.lora_B.default.weight', 'model.layers.11.self_attn.o_proj.base_layer.bias', 'model.layers.11.self_attn.o_proj.base_layer.weight', 'model.layers.11.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.o_proj.lora_A.default.weight', 'model.layers.11.self_attn.o_proj.lora_B.default.weight', 'model.layers.11.self_attn.q_proj.base_layer.bias', 'model.layers.11.self_attn.q_proj.base_layer.weight', 'model.layers.11.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.q_proj.lora_A.default.weight', 'model.layers.11.self_attn.q_proj.lora_B.default.weight', 'model.layers.11.self_attn.v_proj.base_layer.bias', 'model.layers.11.self_attn.v_proj.base_layer.weight', 'model.layers.11.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.v_proj.lora_A.default.weight', 'model.layers.11.self_attn.v_proj.lora_B.default.weight', 'model.layers.12.self_attn.k_proj.base_layer.bias', 'model.layers.12.self_attn.k_proj.base_layer.weight', 'model.layers.12.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.k_proj.lora_A.default.weight', 'model.layers.12.self_attn.k_proj.lora_B.default.weight', 'model.layers.12.self_attn.o_proj.base_layer.bias', 'model.layers.12.self_attn.o_proj.base_layer.weight', 'model.layers.12.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.o_proj.lora_A.default.weight', 'model.layers.12.self_attn.o_proj.lora_B.default.weight', 'model.layers.12.self_attn.q_proj.base_layer.bias', 'model.layers.12.self_attn.q_proj.base_layer.weight', 'model.layers.12.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.q_proj.lora_A.default.weight', 'model.layers.12.self_attn.q_proj.lora_B.default.weight', 'model.layers.12.self_attn.v_proj.base_layer.bias', 'model.layers.12.self_attn.v_proj.base_layer.weight', 'model.layers.12.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.v_proj.lora_A.default.weight', 'model.layers.12.self_attn.v_proj.lora_B.default.weight', 'model.layers.13.self_attn.k_proj.base_layer.bias', 'model.layers.13.self_attn.k_proj.base_layer.weight', 'model.layers.13.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.k_proj.lora_A.default.weight', 'model.layers.13.self_attn.k_proj.lora_B.default.weight', 'model.layers.13.self_attn.o_proj.base_layer.bias', 'model.layers.13.self_attn.o_proj.base_layer.weight', 'model.layers.13.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.o_proj.lora_A.default.weight', 'model.layers.13.self_attn.o_proj.lora_B.default.weight', 'model.layers.13.self_attn.q_proj.base_layer.bias', 'model.layers.13.self_attn.q_proj.base_layer.weight', 'model.layers.13.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.q_proj.lora_A.default.weight', 'model.layers.13.self_attn.q_proj.lora_B.default.weight', 'model.layers.13.self_attn.v_proj.base_layer.bias', 'model.layers.13.self_attn.v_proj.base_layer.weight', 'model.layers.13.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.v_proj.lora_A.default.weight', 'model.layers.13.self_attn.v_proj.lora_B.default.weight', 'model.layers.14.self_attn.k_proj.base_layer.bias', 'model.layers.14.self_attn.k_proj.base_layer.weight', 'model.layers.14.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.k_proj.lora_A.default.weight', 'model.layers.14.self_attn.k_proj.lora_B.default.weight', 'model.layers.14.self_attn.o_proj.base_layer.bias', 'model.layers.14.self_attn.o_proj.base_layer.weight', 'model.layers.14.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.o_proj.lora_A.default.weight', 'model.layers.14.self_attn.o_proj.lora_B.default.weight', 'model.layers.14.self_attn.q_proj.base_layer.bias', 'model.layers.14.self_attn.q_proj.base_layer.weight', 'model.layers.14.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.q_proj.lora_A.default.weight', 'model.layers.14.self_attn.q_proj.lora_B.default.weight', 'model.layers.14.self_attn.v_proj.base_layer.bias', 'model.layers.14.self_attn.v_proj.base_layer.weight', 'model.layers.14.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.v_proj.lora_A.default.weight', 'model.layers.14.self_attn.v_proj.lora_B.default.weight', 'model.layers.15.self_attn.k_proj.base_layer.bias', 'model.layers.15.self_attn.k_proj.base_layer.weight', 'model.layers.15.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.k_proj.lora_A.default.weight', 'model.layers.15.self_attn.k_proj.lora_B.default.weight', 'model.layers.15.self_attn.o_proj.base_layer.bias', 'model.layers.15.self_attn.o_proj.base_layer.weight', 'model.layers.15.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.o_proj.lora_A.default.weight', 'model.layers.15.self_attn.o_proj.lora_B.default.weight', 'model.layers.15.self_attn.q_proj.base_layer.bias', 'model.layers.15.self_attn.q_proj.base_layer.weight', 'model.layers.15.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.q_proj.lora_A.default.weight', 'model.layers.15.self_attn.q_proj.lora_B.default.weight', 'model.layers.15.self_attn.v_proj.base_layer.bias', 'model.layers.15.self_attn.v_proj.base_layer.weight', 'model.layers.15.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.v_proj.lora_A.default.weight', 'model.layers.15.self_attn.v_proj.lora_B.default.weight', 'model.layers.16.self_attn.k_proj.base_layer.bias', 'model.layers.16.self_attn.k_proj.base_layer.weight', 'model.layers.16.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.k_proj.lora_A.default.weight', 'model.layers.16.self_attn.k_proj.lora_B.default.weight', 'model.layers.16.self_attn.o_proj.base_layer.bias', 'model.layers.16.self_attn.o_proj.base_layer.weight', 'model.layers.16.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.o_proj.lora_A.default.weight', 'model.layers.16.self_attn.o_proj.lora_B.default.weight', 'model.layers.16.self_attn.q_proj.base_layer.bias', 'model.layers.16.self_attn.q_proj.base_layer.weight', 'model.layers.16.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.q_proj.lora_A.default.weight', 'model.layers.16.self_attn.q_proj.lora_B.default.weight', 'model.layers.16.self_attn.v_proj.base_layer.bias', 'model.layers.16.self_attn.v_proj.base_layer.weight', 'model.layers.16.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.v_proj.lora_A.default.weight', 'model.layers.16.self_attn.v_proj.lora_B.default.weight', 'model.layers.17.self_attn.k_proj.base_layer.bias', 'model.layers.17.self_attn.k_proj.base_layer.weight', 'model.layers.17.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.k_proj.lora_A.default.weight', 'model.layers.17.self_attn.k_proj.lora_B.default.weight', 'model.layers.17.self_attn.o_proj.base_layer.bias', 'model.layers.17.self_attn.o_proj.base_layer.weight', 'model.layers.17.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.o_proj.lora_A.default.weight', 'model.layers.17.self_attn.o_proj.lora_B.default.weight', 'model.layers.17.self_attn.q_proj.base_layer.bias', 'model.layers.17.self_attn.q_proj.base_layer.weight', 'model.layers.17.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.q_proj.lora_A.default.weight', 'model.layers.17.self_attn.q_proj.lora_B.default.weight', 'model.layers.17.self_attn.v_proj.base_layer.bias', 'model.layers.17.self_attn.v_proj.base_layer.weight', 'model.layers.17.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.v_proj.lora_A.default.weight', 'model.layers.17.self_attn.v_proj.lora_B.default.weight', 'model.layers.18.self_attn.k_proj.base_layer.bias', 'model.layers.18.self_attn.k_proj.base_layer.weight', 'model.layers.18.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.k_proj.lora_A.default.weight', 'model.layers.18.self_attn.k_proj.lora_B.default.weight', 'model.layers.18.self_attn.o_proj.base_layer.bias', 'model.layers.18.self_attn.o_proj.base_layer.weight', 'model.layers.18.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.o_proj.lora_A.default.weight', 'model.layers.18.self_attn.o_proj.lora_B.default.weight', 'model.layers.18.self_attn.q_proj.base_layer.bias', 'model.layers.18.self_attn.q_proj.base_layer.weight', 'model.layers.18.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.q_proj.lora_A.default.weight', 'model.layers.18.self_attn.q_proj.lora_B.default.weight', 'model.layers.18.self_attn.v_proj.base_layer.bias', 'model.layers.18.self_attn.v_proj.base_layer.weight', 'model.layers.18.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.v_proj.lora_A.default.weight', 'model.layers.18.self_attn.v_proj.lora_B.default.weight', 'model.layers.19.self_attn.k_proj.base_layer.bias', 'model.layers.19.self_attn.k_proj.base_layer.weight', 'model.layers.19.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.k_proj.lora_A.default.weight', 'model.layers.19.self_attn.k_proj.lora_B.default.weight', 'model.layers.19.self_attn.o_proj.base_layer.bias', 'model.layers.19.self_attn.o_proj.base_layer.weight', 'model.layers.19.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.o_proj.lora_A.default.weight', 'model.layers.19.self_attn.o_proj.lora_B.default.weight', 'model.layers.19.self_attn.q_proj.base_layer.bias', 'model.layers.19.self_attn.q_proj.base_layer.weight', 'model.layers.19.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.q_proj.lora_A.default.weight', 'model.layers.19.self_attn.q_proj.lora_B.default.weight', 'model.layers.19.self_attn.v_proj.base_layer.bias', 'model.layers.19.self_attn.v_proj.base_layer.weight', 'model.layers.19.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.v_proj.lora_A.default.weight', 'model.layers.19.self_attn.v_proj.lora_B.default.weight', 'model.layers.2.self_attn.k_proj.base_layer.bias', 'model.layers.2.self_attn.k_proj.base_layer.weight', 'model.layers.2.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.k_proj.lora_A.default.weight', 'model.layers.2.self_attn.k_proj.lora_B.default.weight', 'model.layers.2.self_attn.o_proj.base_layer.bias', 'model.layers.2.self_attn.o_proj.base_layer.weight', 'model.layers.2.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.o_proj.lora_A.default.weight', 'model.layers.2.self_attn.o_proj.lora_B.default.weight', 'model.layers.2.self_attn.q_proj.base_layer.bias', 'model.layers.2.self_attn.q_proj.base_layer.weight', 'model.layers.2.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.q_proj.lora_A.default.weight', 'model.layers.2.self_attn.q_proj.lora_B.default.weight', 'model.layers.2.self_attn.v_proj.base_layer.bias', 'model.layers.2.self_attn.v_proj.base_layer.weight', 'model.layers.2.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.v_proj.lora_A.default.weight', 'model.layers.2.self_attn.v_proj.lora_B.default.weight', 'model.layers.20.self_attn.k_proj.base_layer.bias', 'model.layers.20.self_attn.k_proj.base_layer.weight', 'model.layers.20.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.k_proj.lora_A.default.weight', 'model.layers.20.self_attn.k_proj.lora_B.default.weight', 'model.layers.20.self_attn.o_proj.base_layer.bias', 'model.layers.20.self_attn.o_proj.base_layer.weight', 'model.layers.20.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.o_proj.lora_A.default.weight', 'model.layers.20.self_attn.o_proj.lora_B.default.weight', 'model.layers.20.self_attn.q_proj.base_layer.bias', 'model.layers.20.self_attn.q_proj.base_layer.weight', 'model.layers.20.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.q_proj.lora_A.default.weight', 'model.layers.20.self_attn.q_proj.lora_B.default.weight', 'model.layers.20.self_attn.v_proj.base_layer.bias', 'model.layers.20.self_attn.v_proj.base_layer.weight', 'model.layers.20.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.v_proj.lora_A.default.weight', 'model.layers.20.self_attn.v_proj.lora_B.default.weight', 'model.layers.21.self_attn.k_proj.base_layer.bias', 'model.layers.21.self_attn.k_proj.base_layer.weight', 'model.layers.21.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.k_proj.lora_A.default.weight', 'model.layers.21.self_attn.k_proj.lora_B.default.weight', 'model.layers.21.self_attn.o_proj.base_layer.bias', 'model.layers.21.self_attn.o_proj.base_layer.weight', 'model.layers.21.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.o_proj.lora_A.default.weight', 'model.layers.21.self_attn.o_proj.lora_B.default.weight', 'model.layers.21.self_attn.q_proj.base_layer.bias', 'model.layers.21.self_attn.q_proj.base_layer.weight', 'model.layers.21.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.q_proj.lora_A.default.weight', 'model.layers.21.self_attn.q_proj.lora_B.default.weight', 'model.layers.21.self_attn.v_proj.base_layer.bias', 'model.layers.21.self_attn.v_proj.base_layer.weight', 'model.layers.21.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.v_proj.lora_A.default.weight', 'model.layers.21.self_attn.v_proj.lora_B.default.weight', 'model.layers.22.self_attn.k_proj.base_layer.bias', 'model.layers.22.self_attn.k_proj.base_layer.weight', 'model.layers.22.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.k_proj.lora_A.default.weight', 'model.layers.22.self_attn.k_proj.lora_B.default.weight', 'model.layers.22.self_attn.o_proj.base_layer.bias', 'model.layers.22.self_attn.o_proj.base_layer.weight', 'model.layers.22.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.o_proj.lora_A.default.weight', 'model.layers.22.self_attn.o_proj.lora_B.default.weight', 'model.layers.22.self_attn.q_proj.base_layer.bias', 'model.layers.22.self_attn.q_proj.base_layer.weight', 'model.layers.22.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.q_proj.lora_A.default.weight', 'model.layers.22.self_attn.q_proj.lora_B.default.weight', 'model.layers.22.self_attn.v_proj.base_layer.bias', 'model.layers.22.self_attn.v_proj.base_layer.weight', 'model.layers.22.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.v_proj.lora_A.default.weight', 'model.layers.22.self_attn.v_proj.lora_B.default.weight', 'model.layers.23.self_attn.k_proj.base_layer.bias', 'model.layers.23.self_attn.k_proj.base_layer.weight', 'model.layers.23.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.k_proj.lora_A.default.weight', 'model.layers.23.self_attn.k_proj.lora_B.default.weight', 'model.layers.23.self_attn.o_proj.base_layer.bias', 'model.layers.23.self_attn.o_proj.base_layer.weight', 'model.layers.23.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.o_proj.lora_A.default.weight', 'model.layers.23.self_attn.o_proj.lora_B.default.weight', 'model.layers.23.self_attn.q_proj.base_layer.bias', 'model.layers.23.self_attn.q_proj.base_layer.weight', 'model.layers.23.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.q_proj.lora_A.default.weight', 'model.layers.23.self_attn.q_proj.lora_B.default.weight', 'model.layers.23.self_attn.v_proj.base_layer.bias', 'model.layers.23.self_attn.v_proj.base_layer.weight', 'model.layers.23.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.v_proj.lora_A.default.weight', 'model.layers.23.self_attn.v_proj.lora_B.default.weight', 'model.layers.24.self_attn.k_proj.base_layer.bias', 'model.layers.24.self_attn.k_proj.base_layer.weight', 'model.layers.24.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.k_proj.lora_A.default.weight', 'model.layers.24.self_attn.k_proj.lora_B.default.weight', 'model.layers.24.self_attn.o_proj.base_layer.bias', 'model.layers.24.self_attn.o_proj.base_layer.weight', 'model.layers.24.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.o_proj.lora_A.default.weight', 'model.layers.24.self_attn.o_proj.lora_B.default.weight', 'model.layers.24.self_attn.q_proj.base_layer.bias', 'model.layers.24.self_attn.q_proj.base_layer.weight', 'model.layers.24.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.q_proj.lora_A.default.weight', 'model.layers.24.self_attn.q_proj.lora_B.default.weight', 'model.layers.24.self_attn.v_proj.base_layer.bias', 'model.layers.24.self_attn.v_proj.base_layer.weight', 'model.layers.24.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.v_proj.lora_A.default.weight', 'model.layers.24.self_attn.v_proj.lora_B.default.weight', 'model.layers.25.self_attn.k_proj.base_layer.bias', 'model.layers.25.self_attn.k_proj.base_layer.weight', 'model.layers.25.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.k_proj.lora_A.default.weight', 'model.layers.25.self_attn.k_proj.lora_B.default.weight', 'model.layers.25.self_attn.o_proj.base_layer.bias', 'model.layers.25.self_attn.o_proj.base_layer.weight', 'model.layers.25.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.o_proj.lora_A.default.weight', 'model.layers.25.self_attn.o_proj.lora_B.default.weight', 'model.layers.25.self_attn.q_proj.base_layer.bias', 'model.layers.25.self_attn.q_proj.base_layer.weight', 'model.layers.25.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.q_proj.lora_A.default.weight', 'model.layers.25.self_attn.q_proj.lora_B.default.weight', 'model.layers.25.self_attn.v_proj.base_layer.bias', 'model.layers.25.self_attn.v_proj.base_layer.weight', 'model.layers.25.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.v_proj.lora_A.default.weight', 'model.layers.25.self_attn.v_proj.lora_B.default.weight', 'model.layers.26.self_attn.k_proj.base_layer.bias', 'model.layers.26.self_attn.k_proj.base_layer.weight', 'model.layers.26.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.k_proj.lora_A.default.weight', 'model.layers.26.self_attn.k_proj.lora_B.default.weight', 'model.layers.26.self_attn.o_proj.base_layer.bias', 'model.layers.26.self_attn.o_proj.base_layer.weight', 'model.layers.26.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.o_proj.lora_A.default.weight', 'model.layers.26.self_attn.o_proj.lora_B.default.weight', 'model.layers.26.self_attn.q_proj.base_layer.bias', 'model.layers.26.self_attn.q_proj.base_layer.weight', 'model.layers.26.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.q_proj.lora_A.default.weight', 'model.layers.26.self_attn.q_proj.lora_B.default.weight', 'model.layers.26.self_attn.v_proj.base_layer.bias', 'model.layers.26.self_attn.v_proj.base_layer.weight', 'model.layers.26.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.v_proj.lora_A.default.weight', 'model.layers.26.self_attn.v_proj.lora_B.default.weight', 'model.layers.27.self_attn.k_proj.base_layer.bias', 'model.layers.27.self_attn.k_proj.base_layer.weight', 'model.layers.27.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.k_proj.lora_A.default.weight', 'model.layers.27.self_attn.k_proj.lora_B.default.weight', 'model.layers.27.self_attn.o_proj.base_layer.bias', 'model.layers.27.self_attn.o_proj.base_layer.weight', 'model.layers.27.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.o_proj.lora_A.default.weight', 'model.layers.27.self_attn.o_proj.lora_B.default.weight', 'model.layers.27.self_attn.q_proj.base_layer.bias', 'model.layers.27.self_attn.q_proj.base_layer.weight', 'model.layers.27.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.q_proj.lora_A.default.weight', 'model.layers.27.self_attn.q_proj.lora_B.default.weight', 'model.layers.27.self_attn.v_proj.base_layer.bias', 'model.layers.27.self_attn.v_proj.base_layer.weight', 'model.layers.27.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.v_proj.lora_A.default.weight', 'model.layers.27.self_attn.v_proj.lora_B.default.weight', 'model.layers.28.self_attn.k_proj.base_layer.bias', 'model.layers.28.self_attn.k_proj.base_layer.weight', 'model.layers.28.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.k_proj.lora_A.default.weight', 'model.layers.28.self_attn.k_proj.lora_B.default.weight', 'model.layers.28.self_attn.o_proj.base_layer.bias', 'model.layers.28.self_attn.o_proj.base_layer.weight', 'model.layers.28.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.o_proj.lora_A.default.weight', 'model.layers.28.self_attn.o_proj.lora_B.default.weight', 'model.layers.28.self_attn.q_proj.base_layer.bias', 'model.layers.28.self_attn.q_proj.base_layer.weight', 'model.layers.28.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.q_proj.lora_A.default.weight', 'model.layers.28.self_attn.q_proj.lora_B.default.weight', 'model.layers.28.self_attn.v_proj.base_layer.bias', 'model.layers.28.self_attn.v_proj.base_layer.weight', 'model.layers.28.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.v_proj.lora_A.default.weight', 'model.layers.28.self_attn.v_proj.lora_B.default.weight', 'model.layers.29.self_attn.k_proj.base_layer.bias', 'model.layers.29.self_attn.k_proj.base_layer.weight', 'model.layers.29.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.k_proj.lora_A.default.weight', 'model.layers.29.self_attn.k_proj.lora_B.default.weight', 'model.layers.29.self_attn.o_proj.base_layer.bias', 'model.layers.29.self_attn.o_proj.base_layer.weight', 'model.layers.29.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.o_proj.lora_A.default.weight', 'model.layers.29.self_attn.o_proj.lora_B.default.weight', 'model.layers.29.self_attn.q_proj.base_layer.bias', 'model.layers.29.self_attn.q_proj.base_layer.weight', 'model.layers.29.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.q_proj.lora_A.default.weight', 'model.layers.29.self_attn.q_proj.lora_B.default.weight', 'model.layers.29.self_attn.v_proj.base_layer.bias', 'model.layers.29.self_attn.v_proj.base_layer.weight', 'model.layers.29.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.v_proj.lora_A.default.weight', 'model.layers.29.self_attn.v_proj.lora_B.default.weight', 'model.layers.3.self_attn.k_proj.base_layer.bias', 'model.layers.3.self_attn.k_proj.base_layer.weight', 'model.layers.3.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.k_proj.lora_A.default.weight', 'model.layers.3.self_attn.k_proj.lora_B.default.weight', 'model.layers.3.self_attn.o_proj.base_layer.bias', 'model.layers.3.self_attn.o_proj.base_layer.weight', 'model.layers.3.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.o_proj.lora_A.default.weight', 'model.layers.3.self_attn.o_proj.lora_B.default.weight', 'model.layers.3.self_attn.q_proj.base_layer.bias', 'model.layers.3.self_attn.q_proj.base_layer.weight', 'model.layers.3.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.q_proj.lora_A.default.weight', 'model.layers.3.self_attn.q_proj.lora_B.default.weight', 'model.layers.3.self_attn.v_proj.base_layer.bias', 'model.layers.3.self_attn.v_proj.base_layer.weight', 'model.layers.3.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.v_proj.lora_A.default.weight', 'model.layers.3.self_attn.v_proj.lora_B.default.weight', 'model.layers.4.self_attn.k_proj.base_layer.bias', 'model.layers.4.self_attn.k_proj.base_layer.weight', 'model.layers.4.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.k_proj.lora_A.default.weight', 'model.layers.4.self_attn.k_proj.lora_B.default.weight', 'model.layers.4.self_attn.o_proj.base_layer.bias', 'model.layers.4.self_attn.o_proj.base_layer.weight', 'model.layers.4.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.o_proj.lora_A.default.weight', 'model.layers.4.self_attn.o_proj.lora_B.default.weight', 'model.layers.4.self_attn.q_proj.base_layer.bias', 'model.layers.4.self_attn.q_proj.base_layer.weight', 'model.layers.4.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.q_proj.lora_A.default.weight', 'model.layers.4.self_attn.q_proj.lora_B.default.weight', 'model.layers.4.self_attn.v_proj.base_layer.bias', 'model.layers.4.self_attn.v_proj.base_layer.weight', 'model.layers.4.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.v_proj.lora_A.default.weight', 'model.layers.4.self_attn.v_proj.lora_B.default.weight', 'model.layers.5.self_attn.k_proj.base_layer.bias', 'model.layers.5.self_attn.k_proj.base_layer.weight', 'model.layers.5.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.k_proj.lora_A.default.weight', 'model.layers.5.self_attn.k_proj.lora_B.default.weight', 'model.layers.5.self_attn.o_proj.base_layer.bias', 'model.layers.5.self_attn.o_proj.base_layer.weight', 'model.layers.5.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.o_proj.lora_A.default.weight', 'model.layers.5.self_attn.o_proj.lora_B.default.weight', 'model.layers.5.self_attn.q_proj.base_layer.bias', 'model.layers.5.self_attn.q_proj.base_layer.weight', 'model.layers.5.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.q_proj.lora_A.default.weight', 'model.layers.5.self_attn.q_proj.lora_B.default.weight', 'model.layers.5.self_attn.v_proj.base_layer.bias', 'model.layers.5.self_attn.v_proj.base_layer.weight', 'model.layers.5.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.v_proj.lora_A.default.weight', 'model.layers.5.self_attn.v_proj.lora_B.default.weight', 'model.layers.6.self_attn.k_proj.base_layer.bias', 'model.layers.6.self_attn.k_proj.base_layer.weight', 'model.layers.6.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.k_proj.lora_A.default.weight', 'model.layers.6.self_attn.k_proj.lora_B.default.weight', 'model.layers.6.self_attn.o_proj.base_layer.bias', 'model.layers.6.self_attn.o_proj.base_layer.weight', 'model.layers.6.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.o_proj.lora_A.default.weight', 'model.layers.6.self_attn.o_proj.lora_B.default.weight', 'model.layers.6.self_attn.q_proj.base_layer.bias', 'model.layers.6.self_attn.q_proj.base_layer.weight', 'model.layers.6.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.q_proj.lora_A.default.weight', 'model.layers.6.self_attn.q_proj.lora_B.default.weight', 'model.layers.6.self_attn.v_proj.base_layer.bias', 'model.layers.6.self_attn.v_proj.base_layer.weight', 'model.layers.6.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.v_proj.lora_A.default.weight', 'model.layers.6.self_attn.v_proj.lora_B.default.weight', 'model.layers.7.self_attn.k_proj.base_layer.bias', 'model.layers.7.self_attn.k_proj.base_layer.weight', 'model.layers.7.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.k_proj.lora_A.default.weight', 'model.layers.7.self_attn.k_proj.lora_B.default.weight', 'model.layers.7.self_attn.o_proj.base_layer.bias', 'model.layers.7.self_attn.o_proj.base_layer.weight', 'model.layers.7.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.o_proj.lora_A.default.weight', 'model.layers.7.self_attn.o_proj.lora_B.default.weight', 'model.layers.7.self_attn.q_proj.base_layer.bias', 'model.layers.7.self_attn.q_proj.base_layer.weight', 'model.layers.7.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.q_proj.lora_A.default.weight', 'model.layers.7.self_attn.q_proj.lora_B.default.weight', 'model.layers.7.self_attn.v_proj.base_layer.bias', 'model.layers.7.self_attn.v_proj.base_layer.weight', 'model.layers.7.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.v_proj.lora_A.default.weight', 'model.layers.7.self_attn.v_proj.lora_B.default.weight', 'model.layers.8.self_attn.k_proj.base_layer.bias', 'model.layers.8.self_attn.k_proj.base_layer.weight', 'model.layers.8.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.k_proj.lora_A.default.weight', 'model.layers.8.self_attn.k_proj.lora_B.default.weight', 'model.layers.8.self_attn.o_proj.base_layer.bias', 'model.layers.8.self_attn.o_proj.base_layer.weight', 'model.layers.8.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.o_proj.lora_A.default.weight', 'model.layers.8.self_attn.o_proj.lora_B.default.weight', 'model.layers.8.self_attn.q_proj.base_layer.bias', 'model.layers.8.self_attn.q_proj.base_layer.weight', 'model.layers.8.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.q_proj.lora_A.default.weight', 'model.layers.8.self_attn.q_proj.lora_B.default.weight', 'model.layers.8.self_attn.v_proj.base_layer.bias', 'model.layers.8.self_attn.v_proj.base_layer.weight', 'model.layers.8.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.v_proj.lora_A.default.weight', 'model.layers.8.self_attn.v_proj.lora_B.default.weight', 'model.layers.9.self_attn.k_proj.base_layer.bias', 'model.layers.9.self_attn.k_proj.base_layer.weight', 'model.layers.9.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.k_proj.lora_A.default.weight', 'model.layers.9.self_attn.k_proj.lora_B.default.weight', 'model.layers.9.self_attn.o_proj.base_layer.bias', 'model.layers.9.self_attn.o_proj.base_layer.weight', 'model.layers.9.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.o_proj.lora_A.default.weight', 'model.layers.9.self_attn.o_proj.lora_B.default.weight', 'model.layers.9.self_attn.q_proj.base_layer.bias', 'model.layers.9.self_attn.q_proj.base_layer.weight', 'model.layers.9.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.q_proj.lora_A.default.weight', 'model.layers.9.self_attn.q_proj.lora_B.default.weight', 'model.layers.9.self_attn.v_proj.base_layer.bias', 'model.layers.9.self_attn.v_proj.base_layer.weight', 'model.layers.9.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.v_proj.lora_A.default.weight', 'model.layers.9.self_attn.v_proj.lora_B.default.weight']
- This IS expected if you are initializing Starcoder2ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Starcoder2ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Starcoder2ForCausalLM were not initialized from the model checkpoint at finetune_starcoder2/final_checkpoint and are newly initialized: ['model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.10.self_attn.q_proj.bias', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.10.self_attn.v_proj.bias', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.11.self_attn.k_proj.bias', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.11.self_attn.o_proj.bias', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.11.self_attn.q_proj.bias', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.11.self_attn.v_proj.bias', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.12.self_attn.k_proj.bias', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.12.self_attn.o_proj.bias', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.12.self_attn.q_proj.bias', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.12.self_attn.v_proj.bias', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.13.self_attn.k_proj.bias', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.13.self_attn.o_proj.bias', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.13.self_attn.q_proj.bias', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.13.self_attn.v_proj.bias', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.14.self_attn.k_proj.bias', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.14.self_attn.o_proj.bias', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.14.self_attn.q_proj.bias', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.14.self_attn.v_proj.bias', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.15.self_attn.k_proj.bias', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.15.self_attn.o_proj.bias', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.15.self_attn.q_proj.bias', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.15.self_attn.v_proj.bias', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.16.self_attn.k_proj.bias', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.16.self_attn.o_proj.bias', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.16.self_attn.q_proj.bias', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.16.self_attn.v_proj.bias', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.17.self_attn.k_proj.bias', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.17.self_attn.o_proj.bias', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.17.self_attn.q_proj.bias', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.17.self_attn.v_proj.bias', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.18.self_attn.k_proj.bias', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.18.self_attn.o_proj.bias', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.18.self_attn.q_proj.bias', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.18.self_attn.v_proj.bias', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.19.self_attn.k_proj.bias', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.19.self_attn.o_proj.bias', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.19.self_attn.q_proj.bias', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.19.self_attn.v_proj.bias', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.2.self_attn.k_proj.bias', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.2.self_attn.o_proj.bias', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.2.self_attn.q_proj.bias', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.2.self_attn.v_proj.bias', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.20.self_attn.k_proj.bias', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.20.self_attn.o_proj.bias', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.20.self_attn.q_proj.bias', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.20.self_attn.v_proj.bias', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.21.self_attn.k_proj.bias', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.21.self_attn.o_proj.bias', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.bias', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.21.self_attn.v_proj.bias', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.22.self_attn.k_proj.bias', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.22.self_attn.o_proj.bias', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.22.self_attn.q_proj.bias', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.self_attn.v_proj.bias', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.23.self_attn.k_proj.bias', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.23.self_attn.o_proj.bias', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.23.self_attn.q_proj.bias', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.bias', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.24.self_attn.k_proj.bias', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.bias', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.24.self_attn.q_proj.bias', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.24.self_attn.v_proj.bias', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.25.self_attn.k_proj.bias', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.25.self_attn.o_proj.bias', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.25.self_attn.q_proj.bias', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.25.self_attn.v_proj.bias', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.26.self_attn.k_proj.bias', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.26.self_attn.o_proj.bias', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.26.self_attn.q_proj.bias', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.26.self_attn.v_proj.bias', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.bias', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.27.self_attn.o_proj.bias', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.27.self_attn.q_proj.bias', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.27.self_attn.v_proj.bias', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.28.self_attn.k_proj.bias', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.28.self_attn.o_proj.bias', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.28.self_attn.q_proj.bias', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.28.self_attn.v_proj.bias', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.bias', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.29.self_attn.o_proj.bias', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.29.self_attn.q_proj.bias', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.29.self_attn.v_proj.bias', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.3.self_attn.k_proj.bias', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.3.self_attn.o_proj.bias', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.3.self_attn.q_proj.bias', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.bias', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.4.self_attn.k_proj.bias', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.4.self_attn.o_proj.bias', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.4.self_attn.q_proj.bias', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.4.self_attn.v_proj.bias', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.5.self_attn.k_proj.bias', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.5.self_attn.o_proj.bias', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.5.self_attn.q_proj.bias', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.5.self_attn.v_proj.bias', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.6.self_attn.k_proj.bias', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.6.self_attn.o_proj.bias', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.6.self_attn.q_proj.bias', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.6.self_attn.v_proj.bias', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.7.self_attn.k_proj.bias', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.7.self_attn.o_proj.bias', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.7.self_attn.q_proj.bias', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.7.self_attn.v_proj.bias', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.8.self_attn.k_proj.bias', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.8.self_attn.o_proj.bias', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.8.self_attn.q_proj.bias', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.8.self_attn.v_proj.bias', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.9.self_attn.k_proj.bias', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.9.self_attn.o_proj.bias', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.9.self_attn.q_proj.bias', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.bias', 'model.layers.9.self_attn.v_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# to use 4bit use `load_in_4bit=True` instead
quantization_config = BitsAndBytesConfig()

checkpoint = "bigcode/starcoder2-3b"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained("finetune_starcoder2/final_checkpoint", quantization_config=quantization_config)

inputs = tokenizer.encode("hello_world_function <- function() {", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

The expected output of the fine tuning process should be the peft adapter weights, instead of the whole model. Once the training process is finished, you'll see that under the final_checkpoint there are the safetensors of the model, but if you check the size is similar to the original model, while the adapter should be in the order of MB. I guess something is broken with one of the libraries.

If you check the object type you'll notice the problem:

    print("Training...")
    trainer.train()

    print("Saving the last checkpoint of the model")
    print("Original model type:", type(model))
    print("Trained model type:", type(trainer.model))

The output for the model type is:

Saving the last checkpoint of the model
Original model type: <class 'transformers.models.starcoder2.modeling_starcoder2.Starcoder2ForCausalLM'>
Trained model type: <class 'peft.peft_model.PeftModelForCausalLM'>
Training Done! 💥

I've temporarily fixed this issue by changing the next line (although I'm not sure if it's totally correct):
model.save_pretrained(os.path.join(args.output_dir, "final_checkpoint/"))
to:
trainer.model.save_pretrained(os.path.join(args.output_dir, "final_checkpoint/"))

Then, I can use the merge_peft_adapters.py from StarCoder's repository and do inference.

noraise · Answer 3 · Mon Mar 25 2024 17:21:16 GMT+0800 (China Standard Time)

Hey, nice work! How’s the performance? I am wondering if it’s even worth to do this.

Julio Sánchez · Answer 4 · Mon Mar 25 2024 17:34:12 GMT+0800 (China Standard Time)

Hey, nice work! How’s the performance? I am wondering if it’s even worth to do this.

In my case is a must, since I'm doing Instruction Fine-tuning and the performance of the fine-tuned model is good as expected.