Val_unseen evaluation getting worse

Question

Val_unseen evaluation getting worse

FreddyBanana opened this issue a year ago · comments

Hello!

I got a very strange result while running the script "final_frt_gd_finetuning_stable.sh". The log shows that the evaluation on the val_seen dataset is getting better and better, but the performance on the val_unseen dataset is getting worse and worse (Iter 1000 remains the best one while training), as shown below.

So I wonder if I made mistakes in setting the parameters in this script. Here's my settings:

export DATA_ROOT=../datasets
export timestamp=$(date +%m%d-%H%M%S)
export OUT_ROOT=../out/REVERIE/experiments

export train_alg=dagger
export features=clip
export ft_dim=768
export obj_features=vitbase
export obj_ft_dim=768
export rp_img_dir=../room_type_feats.h5

export ngpus=1
export seed=0 # default 0

export name=${train_alg}-${features}
export name=${name}-seed.${seed} #-${ngpus}gpus

export outdir=${OUT_ROOT}/reverie_finetune # path to save logs

export warmup_ckpt=../out/REVERIE/experiments/pretrain/frt_gd_phase2/ckpts/model_step_100000.pt

export flag="
  --root_dir ${DATA_ROOT}
  --dataset reverie
  --output_dir ${outdir}
  --world_size ${ngpus}
  --seed ${seed}
  --tokenizer bert

  --enc_full_graph
  --graph_sprels
  --fusion dynamic
  --multi_endpoints
  --use_room_type
  --use_img_room_head
  --dagger_sample sample

  --train_alg ${train_alg}
  
  --num_l_layers 9
  --num_x_layers 4
  --num_pano_layers 2
  --num_v_layers 4
  
  --max_action_len 15
  --max_instr_len 200
  --max_objects 20

  --batch_size 8
  --lr 1e-5
  --iters 50000
  --log_every 1000
  --optim adamW

  --features ${features}
  --obj_features ${obj_features}
  --image_feat_size ${ft_dim}
  --angle_feat_size 4
  --obj_feat_size ${obj_ft_dim}
  --rp_embed_dir ${rp_img_dir}

  --ml_weight 0.2
  --feat_dropout 0.4
  --dropout 0.5
  --gamma 0.
  
  --node_loss_delta 1.0
  --use_gd
  --stable_gd
  --use_real_dist_norm
  --num_of_ins_img 1
  --gd_dreamer_type attn_dynamic_fuse"

CUDA_VISIBLE_DEVICES=5 python nav_obj.py $flag  \
  --tokenizer bert \
  --bert_ckpt_file $warmup_ckpt

In the previous preprocessing and warmup, the only modification I made was to change the number of images generated by goal dreamer from five to one. And the only difference between the above settings and the original settings is “iters” and “num_of_ins_img".

Please help me modify the parameters, or just point out my mistakes.

Thank you very much!

zehao-wang · Answer 1 · Thu Apr 13 2023 16:43:15 GMT+0800 (China Standard Time)

Hi
I am not sure what the performance you will get in the warmup stage. We observe it is quite easy to overfit the training data, i.e. high performance in seen split. It is better to select the checkpoint with a balance of the performance between seen and unseen.
For example, the checkpoints I used has a warmup performace of val_seen: facc 74.21 val_unseen facc 53

In addition, using num_of_ins_img==1 probably lower the generalization of the model, since the feature is to specific which cannot handle bad quality images.

FreddyBanana · Answer 2 · Fri Apr 14 2023 00:16:14 GMT+0800 (China Standard Time)

May I ask if you can share the preprocessed clip feature with me?
I only need "full_reverie_ins2img_clip.h5".
Using Goal Dreamer to generate five target images for each instruction is too time-consuming.

zehao-wang · Answer 3 · Fri Apr 14 2023 06:08:28 GMT+0800 (China Standard Time)

Hi,
I try to cleanup some storage and put the files on the drive now. You can find the checkpoint of warmup stage 2 at here and our extracted features at here
I try to reproduce the results but only for some iterations, and the performance goes as expected, validation unseen will not going down at the beginning, you can have a try. There are some wrong messages of the loss, just ignore them.

FreddyBanana · Answer 4 · Fri Apr 14 2023 21:53:32 GMT+0800 (China Standard Time)

Thank you very much!