yashkant / concat-vqa

Official code for the paper "Contrast and Classify: Training Robust VQA Models" published at ICCV, 2021

Home Page:https://yashkant.github.io/projects/concat-vqa.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Description of data in splits folder

JurijsNazarovs opened this issue · comments

Hello! Thank you for your great work. I was able to download files from drop box, however, could you add the description of data sets in split directory? Because I am not sure which files correspond to VQA V2, which to VQA-paraphrase and so on. E.g. splits/questions_train_aug.pkl.

Jurijs

Hi, thanks for checking out the repo.

Does this line — https://github.com/yashkant/concat-vqa/blob/master/configs/baseline-train.yml#L37 — answer your questions?

Please let me know if I misunderstood anything, thanks.

Hi. Thanks for the prompt reply. Unfortunately, that line did not help me. From the paper as I understand, you use two data sets, VQA-V2 and VQA-Rephrasing. But you also mentioned that you have data with rephrased question by BackTranslation and by Human. Could you list which split corresponds to which data? For example, splits/questions_train.pkl corresponds to VQA-V2 training set; and splits/questions_train_aug.pkl corresponds to training part of VQA-Rephrasing (I am not sure if what I said is true).

By the way, in table 2 in the paper, where you provide CS(3) and CS(4) scores, is that for validation data or training?

Jurijs

To be more precise, here is the chunk of code, which loads data based on task_cfg:

split_path_dict = {
        "train_aug": [
            "data-release/splits/questions_train_aug.pkl",
            "data-release/splits/ans_train_aug.pkl",
            "train",
        ],
        "train": [
            "data-release/splits/v2_OpenEnded_mscoco_train2014_questions.json",
            "data-release/splits/train_target.pkl",
            "train",
        ],
        "val": [
            "data-release/splits/v2_OpenEnded_mscoco_val2014_questions.json",
            "data-release/splits/val_target.pkl",
            "val",
        ],
        "val_aug": [
            "data-release/splits/questions_val_aug.pkl",
            "data-release/splits/ans_val_aug.pkl",
            "val",
        ],
        "test": [
            "data-release/splits/v2_OpenEnded_mscoco_test2015_questions.json",
            "",
            "test",
        ],
        "trainval_aug": [
            "data-release/splits/questions_trainval_aug.pkl",
            "data-release/splits/ans_trainval_aug.pkl",
            "trainval",
        ],
        "revqa": [
            "data-release/splits/revqa_total_proc.pkl",
            "data-release/splits/revqa_total_proc_target.pkl",
            "revqa",
        ],

Could you explain which file corresponds to which dataset, among VQA-V2, VQA-Rephrasing, BackTranslation?

Sorry for the delay.

train, val, test -- VQAv2 dataset
train_aug, val_aug, trainval_aug -- VQAv2 augmented with rephrased questions from Back Translation
revqa -- Rephrasings VQA dataset by Meet Shah et al.