Missing required files for audiocaption evaluation
jzq2000 opened this issue · comments
The required files in AudiocaptionLoss config are missing.
path:
vocabulary: 'data/pickles/words_list.p'
encoder: 'pretrained_models/audioset_deit.pth' # 'pretrained_models/deit.pth'
word2vec: 'pretrained_models/word2vec/w2v_512.model'
eval_model: 'pretrained_models/ACTm.pth'
The required files in AudiocaptionLoss config are missing.
path: vocabulary: 'data/pickles/words_list.p' encoder: 'pretrained_models/audioset_deit.pth' # 'pretrained_models/deit.pth' word2vec: 'pretrained_models/word2vec/w2v_512.model' eval_model: 'pretrained_models/ACTm.pth'
Hi, please refer to https://disk.pku.edu.cn/link/4908743A441B02235C8652742FE44949 . Also, you can refer to https://github.com/XinhaoMei/ACT
Thanks a lot~ BTW, could your please further provide '/apdcephfs/share_1316500/donchaoyang/code3/ACT/outputs/exp_4/model/best_model.pth'
in settings2.yaml
.
Otherwise, some errors occurs in loading ACTm.pth
.
RuntimeError: Error(s) in loading state_dict for AudioTransformer_80:
size mismatch for pos_embedding: copying a param with shape torch.Size([1, 126, 768]) from checkpoint, the shape in current model is torch.Size([1, 216, 768]).
size mismatch for bn0.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
size mismatch for bn0.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
size mismatch for bn0.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
size mismatch for bn0.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
size mismatch for patch_embed.proj.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([768, 320]).
Thanks a lot~ BTW, could your please further provide
'/apdcephfs/share_1316500/donchaoyang/code3/ACT/outputs/exp_4/model/best_model.pth'
insettings2.yaml
.Otherwise, some errors occurs in loading
ACTm.pth
.RuntimeError: Error(s) in loading state_dict for AudioTransformer_80: size mismatch for pos_embedding: copying a param with shape torch.Size([1, 126, 768]) from checkpoint, the shape in current model is torch.Size([1, 216, 768]). size mismatch for bn0.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]). size mismatch for bn0.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]). size mismatch for bn0.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]). size mismatch for bn0.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]). size mismatch for patch_embed.proj.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([768, 320]).
Have you solved this issue? I got the same problem that missing the best_model.pth when getting the ACT loss metrics.