zengyan-97 / X-VLM

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Loading frompretrained warnining

lezhang7 opened this issue · comments

Hi, when i run the code, i got a lot of warninings

Position interpolate vision_encoder.layers.0.blocks.0.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.0.blocks.1.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.1.blocks.0.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.1.blocks.1.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.0.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.1.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.2.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.3.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.4.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.5.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.6.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.7.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.8.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.9.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.10.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.11.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.12.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.13.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.14.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.15.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.16.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.2.blocks.17.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.3.blocks.0.attn.relative_position_bias_table from 13x13 to 23x23
Position interpolate vision_encoder.layers.3.blocks.1.attn.relative_position_bias_table from 13x13 to 23x23
### Loading pretrained text encoder
load checkpoint from16m_base_model_state_step_199999.th
missing_keys:  []
unexpected_keys:  ['bbox_head.0.weight', 'bbox_head.0.bias', 'bbox_head.1.weight', 'bbox_head.1.bias', 'bbox_head.3.weight', 'bbox_head.3.bias', 'text_encoder.cls.predictions.bias', 'text_encoder.cls.predictions.transform.dense.weight', 'text_encoder.cls.predictions.transform.dense.bias', 'text_encoder.cls.predictions.transform.LayerNorm.weight', 'text_encoder.cls.predictions.transform.LayerNorm.bias', 'text_encoder.cls.predictions.decoder.weight', 'text_encoder.cls.predictions.decoder.bias']```
will this be a problem for fine-tuning?