Missing required files for audiocaption evaluation

Question

Missing required files for audiocaption evaluation

jzq2000 opened this issue 2 years ago · comments

The required files in AudiocaptionLoss config are missing.

path:
  vocabulary: 'data/pickles/words_list.p'
  encoder: 'pretrained_models/audioset_deit.pth'  # 'pretrained_models/deit.pth'
  word2vec: 'pretrained_models/word2vec/w2v_512.model'
  eval_model: 'pretrained_models/ACTm.pth'

Dongchao Yang · Answer 1 · Tue Dec 06 2022 20:36:48 GMT+0800 (China Standard Time)

The required files in AudiocaptionLoss config are missing.

path:
  vocabulary: 'data/pickles/words_list.p'
  encoder: 'pretrained_models/audioset_deit.pth'  # 'pretrained_models/deit.pth'
  word2vec: 'pretrained_models/word2vec/w2v_512.model'
  eval_model: 'pretrained_models/ACTm.pth'

Hi, please refer to https://disk.pku.edu.cn/link/4908743A441B02235C8652742FE44949 . Also, you can refer to https://github.com/XinhaoMei/ACT

Zeqian Ju · Answer 2 · Thu Dec 08 2022 16:14:35 GMT+0800 (China Standard Time)

Thanks a lot~ BTW, could your please further provide '/apdcephfs/share_1316500/donchaoyang/code3/ACT/outputs/exp_4/model/best_model.pth' in settings2.yaml.

Otherwise, some errors occurs in loading ACTm.pth.

RuntimeError: Error(s) in loading state_dict for AudioTransformer_80:
        size mismatch for pos_embedding: copying a param with shape torch.Size([1, 126, 768]) from checkpoint, the shape in current model is torch.Size([1, 216, 768]).
        size mismatch for bn0.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for bn0.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for bn0.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for bn0.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for patch_embed.proj.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([768, 320]).

Xinlei Niu · Answer 3 · Mon Jun 19 2023 10:31:41 GMT+0800 (China Standard Time)

Thanks a lot~ BTW, could your please further provide '/apdcephfs/share_1316500/donchaoyang/code3/ACT/outputs/exp_4/model/best_model.pth' in settings2.yaml.

Otherwise, some errors occurs in loading ACTm.pth.

RuntimeError: Error(s) in loading state_dict for AudioTransformer_80:
        size mismatch for pos_embedding: copying a param with shape torch.Size([1, 126, 768]) from checkpoint, the shape in current model is torch.Size([1, 216, 768]).
        size mismatch for bn0.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for bn0.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for bn0.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for bn0.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for patch_embed.proj.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([768, 320]).

Have you solved this issue? I got the same problem that missing the best_model.pth when getting the ACT loss metrics.