ValueError: 50264 is not in list
YuffieHuang opened this issue · comments
Hi @gaotianyu1350 !
I think I meet the same error as is described in #10 :
Traceback (most recent call last):
File "run.py", line 628, in <module>
main()
File "run.py", line 461, in main
if training_args.do_eval
File "/Users/yfhuang/Documents/GitHub/LM-BFF/src/dataset.py", line 465, in __init__
verbose=True if _ == 0 else False,
File "/Users/yfhuang/Documents/GitHub/LM-BFF/src/dataset.py", line 585, in convert_fn
other_sent_limit=self.args.other_sent_limit,
File "/Users/yfhuang/Documents/GitHub/LM-BFF/src/dataset.py", line 244, in tokenize_multipart_input
mask_pos = [input_ids.index(tokenizer.mask_token_id)]
ValueError: 50264 is not in list
I run the code on my own sentiment analysis dataset, which is similar to sst-5. The language model I used is RoBERTa-base. The version of the Transformers is 3.4.0. The difference is that I only have 3 different labels (nagetive/ neutral/ positive) instead of 5 compared to sst-5. Therefore, I modified "src/processors.py" and changed the number of labels from 5 to 3. Then, I run the code on my own dataset with the task name "sst-5".
I'm not sure if it is a good way. The example for sst-5 works well, but my own test case cannot run properly. Can you please help me out? Thank you!
Hi,
The error is triggered because the mask token is truncated due to max length. You can use a larger max length to bypass this problem.
Hi,
The error is triggered because the mask token is truncated due to max length. You can use a larger max length to bypass this problem.
It works! Thank you for your quick help.