Text-Image Matching in LayoutLMV2
fatfishZhao opened this issue · comments
Describe
Model I am using (LayoutLMV2).
Hi, Thanks for your great work on LayoutLMV2. I am trying to reproduce the pretraining process of this model. I have 2 questions about Text-Image Matching(TIM).
-
When constructing a negative sample, the paper said "perform the same masking and covering operations to images in negative samples". So on image, most of the texts are masked by shuffled boxes, text lines are cut randomly, which I think is a very obvious visual presentation. Does it makes TIM task very easy to be learned?
-
when the image size of negative image is different to the positive image, how to draw masks on negative image? Do I need to first resize it to the positive size?
Hope someone can give me some help.
Thanks.
- Text-image Matching is easier to learn than the text-image alignment task.
- All images are resized to 224x224.