ValueError: 50264 is not in list

Question

ValueError: 50264 is not in list

YuffieHuang opened this issue 3 years ago · comments

I think I meet the same error as is described in #10 :

  Traceback (most recent call last):
    File "run.py", line 628, in <module>
      main()
    File "run.py", line 461, in main
      if training_args.do_eval
    File "/Users/yfhuang/Documents/GitHub/LM-BFF/src/dataset.py", line 465, in __init__
      verbose=True if _ == 0 else False,
    File "/Users/yfhuang/Documents/GitHub/LM-BFF/src/dataset.py", line 585, in convert_fn
      other_sent_limit=self.args.other_sent_limit,
    File "/Users/yfhuang/Documents/GitHub/LM-BFF/src/dataset.py", line 244, in tokenize_multipart_input
      mask_pos = [input_ids.index(tokenizer.mask_token_id)]
  ValueError: 50264 is not in list

I run the code on my own sentiment analysis dataset, which is similar to sst-5. The language model I used is RoBERTa-base. The version of the Transformers is 3.4.0. The difference is that I only have 3 different labels (nagetive/ neutral/ positive) instead of 5 compared to sst-5. Therefore, I modified "src/processors.py" and changed the number of labels from 5 to 3. Then, I run the code on my own dataset with the task name "sst-5".

I'm not sure if it is a good way. The example for sst-5 works well, but my own test case cannot run properly. Can you please help me out? Thank you!

Tianyu Gao · Answer 1 · Sun Oct 31 2021 09:11:40 GMT+0800 (China Standard Time)

Hi,

The error is triggered because the mask token is truncated due to max length. You can use a larger max length to bypass this problem.

Yufei Huang · Answer 2 · Sun Oct 31 2021 10:31:59 GMT+0800 (China Standard Time)

Hi,

The error is triggered because the mask token is truncated due to max length. You can use a larger max length to bypass this problem.

It works! Thank you for your quick help.