LayoutLMv3 extending language sequence length

Question

LayoutLMv3 extending language sequence length

ChristiaensBert opened this issue 2 years ago · comments

I want to use LayoutLMv3 on full documents that have a text sequence length of more than 512. Is there a way to extend this and how should it be done?

Alternatively, could I split up the document into 2 sequences and forward them both with the image, or will this lose too much context?

wolfshow · Answer 1 · Thu Jun 09 2022 14:24:22 GMT+0800 (China Standard Time)

@ChristiaensBert Yes, this is common practice.

rusubbiz muzkaq · Answer 2 · Tue Aug 09 2022 22:29:45 GMT+0800 (China Standard Time)

I have trained LayoutLMv3 model with "bbox": Array2D(dtype="int64", shape=(512, 4)), but documents have max boxes 928. So trained model is not predicting labels for all words(tokens).

I have tried to change value 512 by 1024 & 2048 but while training getting
ValueError: cannot reshape array of size 2048 into shape (1,1024,4)

So, Anyone know how to change config file and any idea to solve this problem

Jyoti Yadav · Answer 3 · Sun Oct 30 2022 23:57:22 GMT+0800 (China Standard Time)

Hi @rusubbiz-muzkaq,

Did you try to find a way to work with lengths of more than 512 tokens on layoutLMV3?I am also getting the same error

Tim Eggers · Answer 4 · Wed Dec 21 2022 20:52:21 GMT+0800 (China Standard Time)

Hi, i have the same problem as @rusubbiz-muzkaq and @jyotiyadav94 and haven't figured it out yet. Any updates?

Edit: NielsRogge/Transformers-Tutorials#203

Arvind Rajan · Answer 5 · Tue Feb 14 2023 10:57:56 GMT+0800 (China Standard Time)

Hi all,

I got it working for a longer sequence length. See #942 (comment).

Thank you :)

Ali (Ata) Tavana · Answer 6 · Thu Feb 23 2023 22:43:03 GMT+0800 (China Standard Time)

Hi all!
I have explained my solution to handle large tokens here, hope it can help you:

huggingface/transformers#19190 (comment)