How to train VQA on my custom data?

Question

How to train VQA on my custom data?

xiaoqiang-lu opened this issue 2 years ago · comments

Hello! I am trying to finetune OFA-large on VQA using custom dataset, using the finetuning instruction in the repo. I have checked my .tsv and .pkl file several times and they are correct as your provided sample. But after command "bash train_vqa_distributed.sh", the terminal just prints:

total_num_updates 40000
warmup_updates 1000
lr 5e-5
patch_image_size 480

The GPU usage will rise to a certain value and then suddenly return to zero, and then the program will end. I train on single server with 2 GPU. Looking forward to reply, thanks for your sharing work!

Yang An · Answer 1 · Thu Apr 21 2022 23:01:38 GMT+0800 (China Standard Time)

Hi, could you please provide the exact script you run on your machine and the information of your GPU-cards type? I will have a check on my environment.

Yang An · Answer 2 · Thu Apr 21 2022 23:19:07 GMT+0800 (China Standard Time)

Moreover, for fine-tuning on customed VQA-formated data, please also refer to this recent issue for more information #76.

xiaoqiang-lu · Answer 3 · Fri Apr 22 2022 12:07:19 GMT+0800 (China Standard Time)

Thanks for your reply! At first I was using two cards 3080ti, now I replaced them with 4 cards v100, however the same problem occurs. The script on my machine:

GPUS_PER_NODE=4
WORKER_CNT=1
export MASTER_ADDR=127.0.0.1
export MASTER_PORT=8214
export RNAK=0

The rest are unchanged. I also make my own ans2label.pkl file.
Here is a part of my .tsv file without imgbase64.

Here is a part of my .pkl file.

Yang An · Answer 4 · Fri Apr 22 2022 12:16:44 GMT+0800 (China Standard Time)

Hi, have you checked the path of $log_file defined in your training script? The running log is saved in this file rather than printed on stdout. The program may be ended for other reasons, which may be recorded in the log. Please share more information if you find this log file.

xiaoqiang-lu · Answer 5 · Fri Apr 22 2022 13:13:40 GMT+0800 (China Standard Time)

Thanks! It seems to be a problem with my image that is causing this, I am using the code you replied to in issue #56 for imgbase64.

xiaoqiang-lu · Answer 6 · Fri Apr 22 2022 14:42:59 GMT+0800 (China Standard Time)

I have solved the above problem, but another problem occurs.

Yang An · Answer 7 · Fri Apr 22 2022 15:39:39 GMT+0800 (China Standard Time)

Hi, please check whether the fields of the input data line which caused this error correspond with the specified selected_cols. By default, the selected_cols is specified as 0,5,2,3,4 in the script, which sequentially fetches the 0th (uniq_id), 5th (image), 2nd (question), 3rd (answer info), 4th (predict_objects) field from each input TSV line. If any of the field mismatches, errors may occur.

xiaoqiang-lu · Answer 8 · Fri Apr 22 2022 17:07:44 GMT+0800 (China Standard Time)

I have check the input data line, and it is same as exsample. I print the column_l and the length of it, column_l is correct [img_id, imgbase64, question, answer, objects].

Yang An · Answer 9 · Fri Apr 22 2022 17:44:50 GMT+0800 (China Standard Time)

Hi, I think there is a misunderstanding of how each data line is organized. As mentioned in the readme, in each line in TSV file, the fields follow the exact order of question-id, image-id, question, answer (with confidence), predicted object labels and image base64 string, thus there are 6 fields in total in the TSV file (also the image-id field is not used). By specifying the selected_cols=0,5,2,3,4, the program sequentially fetches the 0th (question-id), 5th (image), 2nd (question), 3rd (answer info), 4th (predict_objects) field from each input TSV line, resulting in a sample to be further processed in __getitem__ method of VqaGenDataset.

Yang An · Answer 10 · Fri Apr 22 2022 17:55:00 GMT+0800 (China Standard Time)

By the way, for preparing the dataset TSV file, I would also recommend to prepare an original training sample with more than one golden answers into multiple samples each of which contains only one of the answers. This will take full advantage of the supervision of ground-truth answers of training samples. Otherwise, only the golden answer with the highest confidence score will be used as supervision.

Trần Quang Hiệp · Answer 11 · Fri Nov 04 2022 10:59:20 GMT+0800 (China Standard Time)

Thanks! It seems to be a problem with my image that is causing this, I am using the code you replied to in issue #56 for imgbase64.

how you resolve this problem? I''m having same problem. Thanks