Testing: New Data for GLUE Tasks
YashBit opened this issue · comments
Now, I can see that there was another issue similar to this. However, I am still not clear on how to deal with OOD Test Data.
I want to train and validation on original train.tsv and dev.tsv in the folder ORIGINAL. But, I want to test on an out of distribution dataset.
So, let's say I want to test SST-2 on IMDB for roberta-base. How should I go about it? Currently, I replace test.tsv in ORIGINAL folder and generate K shot data. The I run the file using the commands given on README on the repo page. However, the test eval accuracy is the same as the original SST-2 test dataset. I don't know what is happening here. To reiterate:
My objective:
- Test IMDB on roberta-base 42 seed SST-2. But train and validate on original data provided with repo.
Action:
- Replace test.tsv of ORIGINAL SST-2 with IMDB.
Observed Behaviour:
- Same test eval accuracy as original one as if not replaced test.tsv.
Expected Behaviour:
- Same test and dev accuracy, different test accuracy.
Request:
- Please help :) We changed the original test.tsv and then generated K shot again, but there was no change.
Hi,
Make sure your cache files are either deleted, or you use a completely separate data directory/file naming from the original, or you specify the cache overwrite flag. The data loader will load existing cached torch files if --overwrite_cache
is not set. Which is the default.
Reference:
Line 318 in 1bbdc42
Ok, I will delete the cache directories @ajfisch. So I should replace the test.tsv files in the original folder for all tasks in a similar manner?
Yes, either deleting existing cache files (and then the code would overwrite the missing file), or saving the alternate data to a new data directory (so then the cache files would be saved and loaded from new_data_dir/<cache_file_name>) should work.
Hi,
Thanks again for the great work.
Today I actually encountered the same error as issue #7 ., when testing a model prompt-tuned on SST-2 directly on imdb movie review dataset, by replacing the dev.tsv in /original with the imdb dataset, as mentioned in issue #14 .
What I did:
- prompt tune a model ckpt on SST-2, and save the model
- replace the data/original/SST-2/dev.tsv with my own imdb dataset, and format it correctly
- run tools/generate_k_shot.py again. The data/k-shot/SST-2/test.tsv turns to imdb.
- load the model in 1) and put --no_train, --do_predict, --overwrite_cache, and other necessary flags to zero-shot on the imdb dataset. I also cleared the cache before I run it.
Error occurs.
Traceback (most recent call last):
File "run.py", line 628, in
main()
File "run.py", line 466, in main
if training_args.do_predict
File "/home/yb1025/Research/ML_2/robustness/LM-BFF/src/dataset.py", line 465, in init
verbose=True if _ == 0 else False,
File "/home/yb1025/Research/ML_2/robustness/LM-BFF/src/dataset.py", line 585, in convert_fn
other_sent_limit=self.args.other_sent_limit,
File "/home/yb1025/Research/ML_2/robustness/LM-BFF/src/dataset.py", line 243, in tokenize_multipart_input
mask_pos = [input_ids.index(tokenizer.mask_token_id)]
ValueError: 50264 is not in list
This "50264" is the same error as in issue #7
Sorry for the inconvenience but do you happen to know what might went wrong?
Many thanks.
Making another issue, since new error is different.