princeton-nlp / LM-BFF

[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Testing: New Data for GLUE Tasks

YashBit opened this issue · comments

Now, I can see that there was another issue similar to this. However, I am still not clear on how to deal with OOD Test Data.

I want to train and validation on original train.tsv and dev.tsv in the folder ORIGINAL. But, I want to test on an out of distribution dataset.

So, let's say I want to test SST-2 on IMDB for roberta-base. How should I go about it? Currently, I replace test.tsv in ORIGINAL folder and generate K shot data. The I run the file using the commands given on README on the repo page. However, the test eval accuracy is the same as the original SST-2 test dataset. I don't know what is happening here. To reiterate:

My objective:

  1. Test IMDB on roberta-base 42 seed SST-2. But train and validate on original data provided with repo.

Action:

  1. Replace test.tsv of ORIGINAL SST-2 with IMDB.

Observed Behaviour:

  1. Same test eval accuracy as original one as if not replaced test.tsv.

Expected Behaviour:

  1. Same test and dev accuracy, different test accuracy.

Request:

  1. Please help :) We changed the original test.tsv and then generated K shot again, but there was no change.

Hi,

Make sure your cache files are either deleted, or you use a completely separate data directory/file naming from the original, or you specify the cache overwrite flag. The data loader will load existing cached torch files if --overwrite_cache is not set. Which is the default.

Reference:

# Cache name distinguishes mode, task name, tokenizer, and length. So if you change anything beyond these elements, make sure to clear your cache.

Ok, I will delete the cache directories @ajfisch. So I should replace the test.tsv files in the original folder for all tasks in a similar manner?

Yes, either deleting existing cache files (and then the code would overwrite the missing file), or saving the alternate data to a new data directory (so then the cache files would be saved and loaded from new_data_dir/<cache_file_name>) should work.

Hi,
Thanks again for the great work.

Today I actually encountered the same error as issue #7 ., when testing a model prompt-tuned on SST-2 directly on imdb movie review dataset, by replacing the dev.tsv in /original with the imdb dataset, as mentioned in issue #14 .

What I did:

  1. prompt tune a model ckpt on SST-2, and save the model
  2. replace the data/original/SST-2/dev.tsv with my own imdb dataset, and format it correctly
  3. run tools/generate_k_shot.py again. The data/k-shot/SST-2/test.tsv turns to imdb.
  4. load the model in 1) and put --no_train, --do_predict, --overwrite_cache, and other necessary flags to zero-shot on the imdb dataset. I also cleared the cache before I run it.
    Error occurs.
    Traceback (most recent call last):
    File "run.py", line 628, in
    main()
    File "run.py", line 466, in main
    if training_args.do_predict
    File "/home/yb1025/Research/ML_2/robustness/LM-BFF/src/dataset.py", line 465, in init
    verbose=True if _ == 0 else False,
    File "/home/yb1025/Research/ML_2/robustness/LM-BFF/src/dataset.py", line 585, in convert_fn
    other_sent_limit=self.args.other_sent_limit,
    File "/home/yb1025/Research/ML_2/robustness/LM-BFF/src/dataset.py", line 243, in tokenize_multipart_input
    mask_pos = [input_ids.index(tokenizer.mask_token_id)]
    ValueError: 50264 is not in list
    This "50264" is the same error as in issue #7
    Sorry for the inconvenience but do you happen to know what might went wrong?

Many thanks.

Making another issue, since new error is different.