microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Home Page:https://aka.ms/GeneralAI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LayoutLMv3 extending language sequence length

ChristiaensBert opened this issue · comments

I want to use LayoutLMv3 on full documents that have a text sequence length of more than 512. Is there a way to extend this and how should it be done?

Alternatively, could I split up the document into 2 sequences and forward them both with the image, or will this lose too much context?

@ChristiaensBert Yes, this is common practice.

I have trained LayoutLMv3 model with "bbox": Array2D(dtype="int64", shape=(512, 4)), but documents have max boxes 928. So trained model is not predicting labels for all words(tokens).

I have tried to change value 512 by 1024 & 2048 but while training getting
ValueError: cannot reshape array of size 2048 into shape (1,1024,4)

So, Anyone know how to change config file and any idea to solve this problem

Hi @rusubbiz-muzkaq,

Did you try to find a way to work with lengths of more than 512 tokens on layoutLMV3?I am also getting the same error

Hi, i have the same problem as @rusubbiz-muzkaq and @jyotiyadav94 and haven't figured it out yet. Any updates?

Edit: NielsRogge/Transformers-Tutorials#203

Hi all,

I got it working for a longer sequence length. See #942 (comment).

Thank you :)

Hi all!
I have explained my solution to handle large tokens here, hope it can help you:

huggingface/transformers#19190 (comment)