ARCD AraBERTv0.1 Results
YousefGh opened this issue · comments
I think the reported results on ARCD using AraBERTv0.1 are falling into a data leakage problem. I have replicated the same new pipeline you have (using arcd_preprocessing.py
) in which I got these results:
Results: {'exact': 31.623931623931625, 'f1': 67.4479996189414, ..}
Which are very similar to Replicate SQuAD results #30 reported by alyafeai After that, I looked at one of the issues: Question Answering training data #23 where you've added this code snippet:
from SOQAL.data_helpers.data_split import train_test_split, combine_json_files
train_test_split("SOQAL/data/arcd.json",0.5)
combine_json_files(["SOQAL/data/Arabic-SQuAD.json","SOQAL/data/arcd-test.json"])
where you have combined arcd-test.json
with Arabic-SQuAD.json
to produce turk_combined
and use arcd_preprocessing.py
to get turk_combined_all_pre.json
and arcd-test-pre.json
like this:
python arcd_preprocessing.py \
--input_file="/PATH_TO/arcd-test.json" \
--output_file="arcd-test-pre.json" \
--do_farasa_tokenization=True \
--use_farasapy=True
python arcd_preprocessing.py \
--input_file="/PATH_TO/turk_combined.json" \
--output_file="turk_combined_all_pre.json" \
--do_farasa_tokenization=True \
--use_farasapy=True
The problem here is that you have combined the testing set arcd-test.json
with Arabic-SQuAD.json
which are used for fine-tuning. And then tested on data that are used for fine-tuning, arcd-test.json
. Of course, these are speculations as you might've mistakenly put arcd-test.json
instead of arcd-train.json
in the reply only but not the actual code. So, I have purposely leaked the testing dataset with Arabic-SQuAD.json
as what the code snippet above does and got:
Results: {'exact': 49.14529914529915, 'f1': 80.06334012841286, ..}
which are very similar to the reported results for AraBERTv0.1 on ARCD. Can you please check if I'm missing something
Yes, it seems that I actual mistakenly combined the test json instead of the training.
Thank you for the notice, I will update the results in the table asap.