Description of data in splits folder

Question

Description of data in splits folder

JurijsNazarovs opened this issue 2 years ago · comments

Hello! Thank you for your great work. I was able to download files from drop box, however, could you add the description of data sets in split directory? Because I am not sure which files correspond to VQA V2, which to VQA-paraphrase and so on. E.g. splits/questions_train_aug.pkl.

Jurijs

Yash Kant · Answer 1 · Sat Jun 11 2022 22:21:25 GMT+0800 (China Standard Time)

Hi, thanks for checking out the repo.

Does this line — https://github.com/yashkant/concat-vqa/blob/master/configs/baseline-train.yml#L37 — answer your questions?

Please let me know if I misunderstood anything, thanks.

Jurijs Nazarovs · Answer 2 · Tue Jun 14 2022 00:59:28 GMT+0800 (China Standard Time)

Hi. Thanks for the prompt reply. Unfortunately, that line did not help me. From the paper as I understand, you use two data sets, VQA-V2 and VQA-Rephrasing. But you also mentioned that you have data with rephrased question by BackTranslation and by Human. Could you list which split corresponds to which data? For example, splits/questions_train.pkl corresponds to VQA-V2 training set; and splits/questions_train_aug.pkl corresponds to training part of VQA-Rephrasing (I am not sure if what I said is true).

By the way, in table 2 in the paper, where you provide CS(3) and CS(4) scores, is that for validation data or training?

Jurijs

Jurijs Nazarovs · Answer 3 · Tue Jun 14 2022 05:38:22 GMT+0800 (China Standard Time)

To be more precise, here is the chunk of code, which loads data based on task_cfg:

split_path_dict = {
        "train_aug": [
            "data-release/splits/questions_train_aug.pkl",
            "data-release/splits/ans_train_aug.pkl",
            "train",
        ],
        "train": [
            "data-release/splits/v2_OpenEnded_mscoco_train2014_questions.json",
            "data-release/splits/train_target.pkl",
            "train",
        ],
        "val": [
            "data-release/splits/v2_OpenEnded_mscoco_val2014_questions.json",
            "data-release/splits/val_target.pkl",
            "val",
        ],
        "val_aug": [
            "data-release/splits/questions_val_aug.pkl",
            "data-release/splits/ans_val_aug.pkl",
            "val",
        ],
        "test": [
            "data-release/splits/v2_OpenEnded_mscoco_test2015_questions.json",
            "",
            "test",
        ],
        "trainval_aug": [
            "data-release/splits/questions_trainval_aug.pkl",
            "data-release/splits/ans_trainval_aug.pkl",
            "trainval",
        ],
        "revqa": [
            "data-release/splits/revqa_total_proc.pkl",
            "data-release/splits/revqa_total_proc_target.pkl",
            "revqa",
        ],

Could you explain which file corresponds to which dataset, among VQA-V2, VQA-Rephrasing, BackTranslation?

Yash Kant · Answer 4 · Fri Jun 24 2022 02:35:17 GMT+0800 (China Standard Time)

Sorry for the delay.

train, val, test -- VQAv2 dataset
train_aug, val_aug, trainval_aug -- VQAv2 augmented with rephrased questions from Back Translation
revqa -- Rephrasings VQA dataset by Meet Shah et al.