LuoweiZhou / VLP

Vision-Language Pre-training for Image Captioning and Question Answering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can't reproduce the scst results

Leoncatd opened this issue · comments

Hi, thanks for the great work.
I followed the steps in README, but was unable to reproduce the results of SCST(129.3 CIDEr in the paper), I got the highest performance in the first epoch of SCST: 120.5 CIDEr.

My script:
python vlp/run_img2txt_dist.py --output_dir $CHECKPOINT_ROOT/${checkpoint_coco_ce} --do_train --new_segment_ids --always_truncate_tail --amp --src_file $DATA_ROOT/COCO/annotations/dataset_coco.json --file_valid_jpgs $DATA_ROOT/COCO/annotations/coco_valid_jpgs.json --image_root $DATA_ROOT/COCO/region_feat_gvd_wo_bgd --enable_butd --s2s_prob 1 --bi_prob 0 --train_batch_size 16 --max_pred 0 --mask_prob 0 --scst --model_recover_path "coco_g8_lr3e-5_batch512_ft_from_s0.75_b0.25/model.28.bin"

I use 4 GPUs, the training takes 30 epochs and the batch size is set to 16. I set --model_recover_path as the pre-trained model you provided in link.

I would like to know if there is something wrong with me that prevents me from reproducing the results in the paper. Thanks~

@Leoncatd SCST requires a much smaller learning rate than the default 3e-5. You may want to try setting --learning_rate 1e-6 as we noted in this table.

I have added a note in README to avoid any future confusions: 7598a4e

@Leoncatd SCST requires a much smaller learning rate than the default 3e-5. You may want to try setting --learning_rate 1e-6 as we noted in this table.

Thanks for the reply and info~ It works :)