Text-Image Matching in LayoutLMV2

Question

Text-Image Matching in LayoutLMV2

fatfishZhao opened this issue 2 years ago · comments

Describe
Model I am using (LayoutLMV2).

Hi, Thanks for your great work on LayoutLMV2. I am trying to reproduce the pretraining process of this model. I have 2 questions about Text-Image Matching(TIM).

When constructing a negative sample, the paper said "perform the same masking and covering operations to images in negative samples". So on image, most of the texts are masked by shuffled boxes, text lines are cut randomly, which I think is a very obvious visual presentation. Does it makes TIM task very easy to be learned?
when the image size of negative image is different to the positive image, how to draw masks on negative image? Do I need to first resize it to the positive size?

Hope someone can give me some help.

Thanks.

wolfshow · Answer 1 · Tue Jun 07 2022 15:08:53 GMT+0800 (China Standard Time)

Text-image Matching is easier to learn than the text-image alignment task.
All images are resized to 224x224.