nikitakit / self-attentive-parser

High-accuracy NLP parser with models for 11 languages.

Home Page:https://parser.kitaev.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ValueError: cannot copy sequence with size 798 to array axis with dimension 512

thorunna opened this issue · comments

I have successfully trained a model and am trying to parse sentences but I get the following error when parsing some of the files:

Token indices sequence length is longer than the specified maximum  sequence length for this BERT model (798 > 512). Running this sequence through BERT will result in indexing errors
Loading model from ./tools/neuralParser/_dev=83.54.pt...
Parsing sentences...
Traceback (most recent call last):
  File "./tools/neuralParser/src/main.py", line 613, in <module>
    main()
  File "./tools/neuralParser/src/main.py", line 609, in main
    args.callback(args)
  File "./tools/neuralParser/src/main.py", line 492, in run_parse
    predicted, _ = parser.parse_batch(subbatch_sentences)
  File "/users/home/tha86/iceParsingPipeline/tools/neuralParser/src/parse_nk.py", line 1008, in parse_batch
    all_input_ids[snum, :len(input_ids)] = input_ids
ValueError: cannot copy sequence with size 798 to array axis with dimension 512

This error also occurred when training the model but I removed some sentences to get around it. I have tried changing max_len in several places to find out where the error can be handled but nothing seems to work. I have sentence_max_len=300 unchanged in main.py, which I would think solved this problem but obviously not. Is there a way to handle this error?