RobinSmits / Dutch-LLMs

Various training, inference and validation code and results related to Open LLM's that were pretrained (full or partially) on the Dutch language.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incompatible Matric Multiplication

ankitpdc opened this issue · comments

@RobinSmits I tried running the model polylm_13b_ft_alpaca_clean_dutch with same sample data, but getting the error related to incompatible matrix multiplication.
I want to check the model performance for dutch langauge. What changes would you suggest me?

`---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[21], line 39
37 for item in val_data:
38 print(f'\n\n=== Voorbeeld: {counter} ======================================================================================')
---> 39 generate(item['instruction'], item['input'])
41 counter += 1
42 if counter > 5:

Cell In[21], line 11, in generate(instruction, input)
8 attention_masks = inputs.attention_mask.cuda()
10 # Generate output
---> 11 outputs = model.generate(input_ids = input_ids,
12 attention_mask = attention_masks,
13 max_new_tokens = 128,
14 do_sample = True,
15 top_p = 0.85,
16 top_k = 50,
17 temperature = 0.5,
18 repetition_penalty = 1.2,
19 length_penalty = -1.0,
20 num_return_sequences = 1,
21 pad_token_id = tokenizer.eos_token_id,
22 forced_eos_token_id = tokenizer.eos_token_id)
24 # Decode output
25 generated_output = tokenizer.decode(outputs[0], skip_special_tokens = True)

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/peft/peft_model.py:975, in PeftModelForCausalLM.generate(self, **kwargs)
973 self.base_model.generation_config = self.generation_config
974 try:
--> 975 outputs = self.base_model.generate(**kwargs)
976 except:
977 self.base_model.prepare_inputs_for_generation = self.base_model_prepare_inputs_for_generation

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/transformers/generation/utils.py:1648, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
1640 input_ids, model_kwargs = self._expand_inputs_for_generation(
1641 input_ids=input_ids,
1642 expand_size=generation_config.num_return_sequences,
1643 is_encoder_decoder=self.config.is_encoder_decoder,
1644 **model_kwargs,
1645 )
1647 # 13. run sample
-> 1648 return self.sample(
1649 input_ids,
1650 logits_processor=logits_processor,
1651 logits_warper=logits_warper,
1652 stopping_criteria=stopping_criteria,
1653 pad_token_id=generation_config.pad_token_id,
1654 eos_token_id=generation_config.eos_token_id,
1655 output_scores=generation_config.output_scores,
1656 return_dict_in_generate=generation_config.return_dict_in_generate,
1657 synced_gpus=synced_gpus,
1658 streamer=streamer,
1659 **model_kwargs,
1660 )
1662 elif generation_mode == GenerationMode.BEAM_SEARCH:
1663 # 11. prepare beam search scorer
1664 beam_scorer = BeamSearchScorer(
1665 batch_size=batch_size,
1666 num_beams=generation_config.num_beams,
(...)
1671 max_length=generation_config.max_length,
1672 )

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/transformers/generation/utils.py:2730, in GenerationMixin.sample(self, input_ids, logits_processor, stopping_criteria, logits_warper, max_length, pad_token_id, eos_token_id, output_attentions, output_hidden_states, output_scores, return_dict_in_generate, synced_gpus, streamer, **model_kwargs)
2727 model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
2729 # forward pass to get next token
-> 2730 outputs = self(
2731 **model_inputs,
2732 return_dict=True,
2733 output_attentions=output_attentions,
2734 output_hidden_states=output_hidden_states,
2735 )
2737 if synced_gpus and this_peer_finished:
2738 continue # don't waste resources running the code we don't need

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/accelerate/hooks.py:165, in add_hook_to_module..new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py:1076, in GPT2LMHeadModel.forward(self, input_ids, past_key_values, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, labels, use_cache, output_attentions, output_hidden_states, return_dict)
1068 r"""
1069 labels (torch.LongTensor of shape (batch_size, sequence_length), optional):
1070 Labels for language modeling. Note that the labels are shifted inside the model, i.e. you can set
1071 labels = input_ids Indices are selected in [-100, 0, ..., config.vocab_size] All labels set to -100
1072 are ignored (masked), the loss is only computed for labels in [0, ..., config.vocab_size]
1073 """
1074 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
-> 1076 transformer_outputs = self.transformer(
1077 input_ids,
1078 past_key_values=past_key_values,
1079 attention_mask=attention_mask,
1080 token_type_ids=token_type_ids,
1081 position_ids=position_ids,
1082 head_mask=head_mask,
1083 inputs_embeds=inputs_embeds,
1084 encoder_hidden_states=encoder_hidden_states,
1085 encoder_attention_mask=encoder_attention_mask,
1086 use_cache=use_cache,
1087 output_attentions=output_attentions,
1088 output_hidden_states=output_hidden_states,
1089 return_dict=return_dict,
1090 )
1091 hidden_states = transformer_outputs[0]
1093 # Set device for model parallelism

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py:900, in GPT2Model.forward(self, input_ids, past_key_values, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, use_cache, output_attentions, output_hidden_states, return_dict)
890 outputs = torch.utils.checkpoint.checkpoint(
891 create_custom_forward(block),
892 hidden_states,
(...)
897 encoder_attention_mask,
898 )
899 else:
--> 900 outputs = block(
901 hidden_states,
902 layer_past=layer_past,
903 attention_mask=attention_mask,
904 head_mask=head_mask[i],
905 encoder_hidden_states=encoder_hidden_states,
906 encoder_attention_mask=encoder_attention_mask,
907 use_cache=use_cache,
908 output_attentions=output_attentions,
909 )
911 hidden_states = outputs[0]
912 if use_cache is True:

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/accelerate/hooks.py:165, in add_hook_to_module..new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py:390, in GPT2Block.forward(self, hidden_states, layer_past, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, use_cache, output_attentions)
388 residual = hidden_states
389 hidden_states = self.ln_1(hidden_states)
--> 390 attn_outputs = self.attn(
391 hidden_states,
392 layer_past=layer_past,
393 attention_mask=attention_mask,
394 head_mask=head_mask,
395 use_cache=use_cache,
396 output_attentions=output_attentions,
397 )
398 attn_output = attn_outputs[0] # output_attn: a, present, (attentions)
399 outputs = attn_outputs[1:]

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/accelerate/hooks.py:165, in add_hook_to_module..new_forward(*args, **kwargs)
163 output = old_forward(*args, **kwargs)
164 else:
--> 165 output = old_forward(*args, **kwargs)
166 return module._hf_hook.post_forward(module, output)

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py:312, in GPT2Attention.forward(self, hidden_states, layer_past, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, use_cache, output_attentions)
310 attention_mask = encoder_attention_mask
311 else:
--> 312 query, key, value = self.c_attn(hidden_states).split(self.split_size, dim=2)
314 query = self._split_heads(query, self.num_heads, self.head_dim)
315 key = self._split_heads(key, self.num_heads, self.head_dim)

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/peft/tuners/lora.py:1208, in Linear4bit.forward(self, x)
1207 def forward(self, x: torch.Tensor):
-> 1208 result = super().forward(x)
1210 if self.disable_adapters or self.active_adapter not in self.lora_A.keys():
1211 return result

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/bitsandbytes/nn/modules.py:248, in Linear4bit.forward(self, x)
245 x = x.to(self.compute_dtype)
247 bias = None if self.bias is None else self.bias.to(self.compute_dtype)
--> 248 out = bnb.matmul_4bit(x, self.weight.t(), bias=bias, quant_state=self.weight.quant_state)
250 out = out.to(inp_dtype)
252 return out

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py:579, in matmul_4bit(A, B, quant_state, out, bias)
577 return out
578 else:
--> 579 return MatMul4Bit.apply(A, B, out, bias, quant_state)

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/torch/autograd/function.py:506, in Function.apply(cls, *args, **kwargs)
503 if not torch._C._are_functorch_transforms_active():
504 # See NOTE: [functorch vjp and autograd interaction]
505 args = _functorch.utils.unwrap_dead_wrappers(args)
--> 506 return super().apply(*args, **kwargs) # type: ignore[misc]
508 if cls.setup_context == _SingleLevelFunction.setup_context:
509 raise RuntimeError(
510 'In order to use an autograd.Function with functorch transforms '
511 '(vmap, grad, jvp, jacrev, ...), it must override the setup_context '
512 'staticmethod. For more details, please see '
513 'https://pytorch.org/docs/master/notes/extending.func.html')

File ~/Projects/ankit/venv39/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py:516, in MatMul4Bit.forward(ctx, A, B, out, bias, state)
511 return torch.empty(A.shape[:-1] + B_shape[:1], dtype=A.dtype, device=A.device)
514 # 1. Dequantize
515 # 2. MatmulnN
--> 516 output = torch.nn.functional.linear(A, F.dequantize_4bit(B, state).to(A.dtype).t(), bias)
518 # 3. Save state
519 ctx.state = state

RuntimeError: mat1 and mat2 shapes cannot be multiplied (31x5120 and 15360x5120)```

Hi @ankitpdc
Thanks for reporting the issue. Not exactly sure at this moment what the root cause is.

I did notice that the base model at HuggingFace for PolyLM 13B was modified shortly after I trained and published my adapter model.

I suspect that it is related to this. I will test inference out myself in a few days and see if I get the same error based on the newer base model.

If so..than I will retrain the adapter model somewhere next week.

I will keep you updated.

@ankitpdc
I did some quick tests in a Kaggle Notebook.

https://www.kaggle.com/code/rsmits/polylm-13b-inference

The model there works without errors.

I suspect it is related to one of the Python library versions. Verify the versions of torch, peft, accelerate and bitsandbytes.
That should work.

Let me know if it solves the problem.

@RobinSmits Even with the same versions of torch, peft, accelerate and bitsandbytes, getting the same error. Trying to understand the reason for error. Thank you for updating the kaggle notebook with package versions.

Also, please let me know if you can find out any other reason for the error. I will add here as soon as I find the reason.

Update: Just checked, in the colab notebook, you checked with open_llama_13b_alpaca_clean_dutch_qlora, this model is working perfectly for me as well. Facing error for polylm-13b-inference (polylm_13b_ft_alpaca_clean_dutch) model.

@ankitpdc apologies...my mistake ;-)
I've updated the notebook now and it is running now for the PolyLM 13B model. So will see the results tomorrow.

If the adapter model gives the same errors you experienced than it is likely because the base model was updated.

@ankitpdc The test run in the Kaggle notebook had exactly the same error as you posted.

I will retrain the adapter model somewhere in the coming week as this is related to the updated base model

@ankitpdc I've retrained the PolyLM 13B Adapter model and pushed it to Huggingface. The updated Training and Inference notebooks are also commited to github.

Goodluck!