More Than 512 tokens causes error
RYNEQ opened this issue · comments
Ariyan Eghbal commented
HI
thanks for your work.
I'm trying the model but when my text is more than or equal 512 tokens I get:
InvalidArgumentError: indices[0,512] = 512 is not in [0, 512) [Op:ResourceGather]
I know original BERT is limited to 512 tokens and truncates more tokens but
What can I do other than splitting my text into smaller parts less than 512 tokens?
Because splitting on positions other than punctuation chars can break the entity sequence