princeton-nlp / LM-BFF

[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ValueError: 50264 is not in list

YuffieHuang opened this issue · comments

Hi @gaotianyu1350 !

I think I meet the same error as is described in #10 :

  Traceback (most recent call last):
    File "run.py", line 628, in <module>
      main()
    File "run.py", line 461, in main
      if training_args.do_eval
    File "/Users/yfhuang/Documents/GitHub/LM-BFF/src/dataset.py", line 465, in __init__
      verbose=True if _ == 0 else False,
    File "/Users/yfhuang/Documents/GitHub/LM-BFF/src/dataset.py", line 585, in convert_fn
      other_sent_limit=self.args.other_sent_limit,
    File "/Users/yfhuang/Documents/GitHub/LM-BFF/src/dataset.py", line 244, in tokenize_multipart_input
      mask_pos = [input_ids.index(tokenizer.mask_token_id)]
  ValueError: 50264 is not in list

I run the code on my own sentiment analysis dataset, which is similar to sst-5. The language model I used is RoBERTa-base. The version of the Transformers is 3.4.0. The difference is that I only have 3 different labels (nagetive/ neutral/ positive) instead of 5 compared to sst-5. Therefore, I modified "src/processors.py" and changed the number of labels from 5 to 3. Then, I run the code on my own dataset with the task name "sst-5".

I'm not sure if it is a good way. The example for sst-5 works well, but my own test case cannot run properly. Can you please help me out? Thank you!

Hi,

The error is triggered because the mask token is truncated due to max length. You can use a larger max length to bypass this problem.

Hi,

The error is triggered because the mask token is truncated due to max length. You can use a larger max length to bypass this problem.

It works! Thank you for your quick help.