Not attention mask is input for training

Question

double-fire-0 opened this issue a month ago · comments

From the code from huggingface minicpmv, I notice that the attention_mask is None when calling llama3.forward function.

It will work fine when setting batch size to 1, but it seems that is not suitable for when batch size > 1.