Not attention mask is input for training
double-fire-0 opened this issue · comments
double-fire-0 commented
From the code from huggingface minicpmv, I notice that the attention_mask is None when calling llama3.forward function.
It will work fine when setting batch size to 1, but it seems that is not suitable for when batch size > 1.