ValueError: cannot copy sequence with size 798 to array axis with dimension 512
thorunna opened this issue · comments
I have successfully trained a model and am trying to parse sentences but I get the following error when parsing some of the files:
Token indices sequence length is longer than the specified maximum sequence length for this BERT model (798 > 512). Running this sequence through BERT will result in indexing errors
Loading model from ./tools/neuralParser/_dev=83.54.pt...
Parsing sentences...
Traceback (most recent call last):
File "./tools/neuralParser/src/main.py", line 613, in <module>
main()
File "./tools/neuralParser/src/main.py", line 609, in main
args.callback(args)
File "./tools/neuralParser/src/main.py", line 492, in run_parse
predicted, _ = parser.parse_batch(subbatch_sentences)
File "/users/home/tha86/iceParsingPipeline/tools/neuralParser/src/parse_nk.py", line 1008, in parse_batch
all_input_ids[snum, :len(input_ids)] = input_ids
ValueError: cannot copy sequence with size 798 to array axis with dimension 512
This error also occurred when training the model but I removed some sentences to get around it. I have tried changing max_len
in several places to find out where the error can be handled but nothing seems to work. I have sentence_max_len=300
unchanged in main.py
, which I would think solved this problem but obviously not. Is there a way to handle this error?