LayoutLMv3 extending language sequence length
ChristiaensBert opened this issue · comments
I want to use LayoutLMv3 on full documents that have a text sequence length of more than 512. Is there a way to extend this and how should it be done?
Alternatively, could I split up the document into 2 sequences and forward them both with the image, or will this lose too much context?
@ChristiaensBert Yes, this is common practice.
I have trained LayoutLMv3 model with "bbox": Array2D(dtype="int64", shape=(512, 4)),
but documents have max boxes 928. So trained model is not predicting labels for all words(tokens).
I have tried to change value 512
by 1024
& 2048
but while training getting
ValueError: cannot reshape array of size 2048 into shape (1,1024,4)
So, Anyone know how to change config file and any idea to solve this problem
Hi @rusubbiz-muzkaq,
Did you try to find a way to work with lengths of more than 512 tokens on layoutLMV3?I am also getting the same error
Hi, i have the same problem as @rusubbiz-muzkaq and @jyotiyadav94 and haven't figured it out yet. Any updates?
Hi all!
I have explained my solution to handle large tokens here, hope it can help you: