seanzhuh / SeqTR

SeqTR: A Simple yet Universal Network for Visual Grounding

Home Page:https://arxiv.org/abs/2203.16265

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

size mismatch for head.transformer.seq_positional_encoding.embedding.weight:

CCYChongyanChen opened this issue · comments

Dear Author,
I am trying to use the model for Refcocog (pre-trained + fine-tuned SeqTR segmentation) and test it on Refcoco dataset and visualize the results.

The code I run is "python tools/inference.py /home/chch3470/SeqTR/configs/seqtr/segmentation/seqtr_segm_refcoco-unc.py "/home/chch3470/SeqTR/work_dir/segm_best.pth" --output-dir="/home/chch3470/SeqTR/attention_map_output" --with-gt --which-set="testA"
"

I meet the error below. Do you have any idea why it happens? Is Refcocog (pre-trained + fine-tuned SeqTR segmentation) based on yolo or darknet? If it is based on yolo, what configs should we use? Also, should we change the vis_encs(currently the codebase only provides darknet.py for vis_encs)?

I can visualize the provided models for detection tasks so I guess I know the basic setups...

RuntimeError: Error(s) in loading state_dict for SeqTR:
size mismatch for lan_enc.embedding.weight: copying a param with shape torch.Size([12692, 300]) from checkpoint, the shape in current model is torch.Size([10344, 300]).
size mismatch for head.transformer.seq_positional_encoding.embedding.weight: copying a param with shape torch.Size([25, 256]) from checkpoint, the shape in current model is torch.Size([37, 256]).

In RefCOCOg, we sample 12 points instead of 18 points as in RefCOCO dataset, you should use configs/seqtr/segmentation/seqtr_segm_refcocog-umd.py if you are testing the model trained on refcocog dataset. 12692 and 10344 are the distinctive number of words in each dataset.

Thank you so much for the quick reply.
Is that possible to test it on a customized dataset without fine-tuning it?
Our experiment will have two settings (1) use the pretrained one and directly test on our dataset (2) pretraining+fine-tuning on our dataset.

For example, if I use the pretrained RefCOCOg dataset and directly test it on RefCOCO without fine-tuning, could I just replace RefCOCO's two pkl files and word_emb with RefCOCOg's two pkl files and word_emb? Would that work?

In RefCOCOg, we sample 12 points instead of 18 points as in RefCOCO dataset, you should use configs/seqtr/segmentation/seqtr_segm_refcocog-umd.py if you are testing the model trained on refcocog dataset. 12692 and 10344 are the distinctive number of words in each dataset.

yes, that'll work, and you also need to change the sampled number of points in configuration to align with the checkpoint model.

yes, that'll work, and you also need to change the sampled number of points in configuration to align with the checkpoint model.

Thank you so much! Just to confirm, in order to run refcocog on other dataset(e.g., refcoco) I need to
(1) set num_ray=18 to num_ray=12 for refcoco-unc.py
(2) modify num_ray at line 1 to 12 and model.head.shuffle_fraction to 0.2 at line 35, in configs/seqtr/segmentation/seqtr_mask_darknet.py.

Do I need to change the max_token from 15 to 20?

---- update:
I changed (1) and (2) and that worked.
Didnt change the max_token.

Another question, do we need to disable LSJ and EMA for pre-trainning/fine-tuning for the segmentation tasks?
Are LSJ and EMA only for training from scratch?